Re: [GSoC] Extend the static analysis pass

2020-03-24 Thread David Malcolm via Gcc
On Tue, 2020-03-24 at 10:18 -0500, Nader Al Awar wrote:
> Hello,
> 
> I am a master's student at UT Austin and I am interested in working
> on
> extending the static analysis pass project as part of GSoC.

Hello, I'm the author/maintainer of the static analysis pass, and would
be the mentor for the GSoC project.

Several students have expressed an interest in this project, but AFAIK
no-one has yet submitted a formal application.

I wrote a fairly broad project idea on the GCC wiki page, so there may
be room for more than one student.

> Specifically, I'm interested in both adding C++ support for
> new/delete
> and adding plugin support.

There's plenty of work to be done to add C++ support.

"Adding plugin support" makes most sense if there's a specific plugin
for using the plugin interface, so I'm interested in seeing what ideas
you might have as a proof-of-concept plugin (e.g. is there a particular
domain-specific checker you're interested in adding?).

> Most of my background is in software engineering and testing:
> building
> tools to find bugs and improving software reliability through
> testing,
> but I do have some experience with static analysis. Most of my
> programming experience is in C++ and C to a lesser extent. Also, I
> have always been interested in working on a compiler project, so this
> project seems like a good opportunity to combine my past experience
> with my other interests.

This sounds like a good skill set.

> I already worked through the “Before You Apply“ section and I was
> wondering whether there are any tasks I need to do before I submit an
> official application.

Your email seems like a good start.

I'm new to being a GSoC mentor, so I'm afraid I'm a little hazy on the
details.

> PS: I am resending this email because I accidentally sent it as HTML
> before
> 
> Sincerely,
> Nader Al Awar

Hope this is helpful
Dave



Blog post about static analyzer in GCC 10

2020-03-26 Thread David Malcolm via Gcc
I wrote a blog post "Static analysis in GCC 10" giving an idea of the
current status of the -fanalyzer feature:
https://developers.redhat.com/blog/2020/03/26/static-analysis-in-gcc-10/

At some point I'll write up the material for our changes.html page.

Dave



Re: GSoC Static Analysis

2020-03-26 Thread David Malcolm via Gcc
On Wed, 2020-03-25 at 15:36 -0700, Andrew Briand via Gcc wrote:
> Hello,
> 
> I am an undergrad interested in extending GCC’s static analysis pass
> for GSoC 2020. In particular, I’m interested in adding C++ support. 

Hi Andrew, thanks for your interest in the project.

> The selected project ideas list mentions adding new/delete checking
> and exception checking. The features that immediately come to my mind
> would be checking for undeleted allocations, mixing delete and
> delete[], double deletion (it seems the current static analyzer
> already checks for double free), and uncaught exceptions.

This sounds like a good list.  The analyzer currently ignores parts of
GCC's IR relating to exceptions, so properly supporting them will
require some new code.

> What would the expected scope of this project be? All of these
> features sound interesting to me, but I have no idea if doing all of
> them would be feasible within GSoC.

I kept the scope of the proposal quite broad, as there's plenty of work
to be done on the analyzer.

Several people have expressed an interest in the project (and, indeed
in the C++ part of it).  There may be room for more than one analyzer-
related project if we carve things out appropriately.  I think
realistically I don't have the bandwidth for more than two people (and
even that may be pushing it, especially given the disruption we're all
facing).

> For information about my experience, I have about a year and a half
> of C++ experience (about nine months in a large code base), have
> written a few toy compilers in the past, and will soon be starting to
> take a formal course about compilers at my university.

Sounds good.

A good next step for those interested in the project might be to try
compiling the analyzer from source.

David



Re: C (not C++) compiler plugins

2020-04-24 Thread David Malcolm via Gcc
On Fri, 2020-04-24 at 13:03 -0600, Maurice Smulders via Gcc wrote:
> Hello,
> 
> Hugo Landau figured out why it didn't load:
> Yes.
> 
> 
> The reference to cp_global_trees appears to be caused by the below
> code,
> which only relates to C++. For C, try commenting it out like this:
> 
> OUTF ("- !compex/method\n", i);
> OUTF ("  name: %s\n", method_name);
> OUTF ("  asm: %s\n", mangled_name);
> #if 0
> _bool("  virtual", DECL_VIRTUAL_P(arg));
> _bool("  artificial", DECL_ARTIFICIAL(arg));
> _bool("  const", DECL_CONST_MEMFUNC_P(arg));
> _bool("  static", DECL_STATIC_FUNCTION_P(arg));
> _bool("  constructor", DECL_CONSTRUCTOR_P(arg));
> _bool("  destructor", DECL_DESTRUCTOR_P(arg));
> _bool("  copyconstructor", DECL_COPY_CONSTRUCTOR_P(arg));
> _bool("  baseconstructor", DECL_BASE_CONSTRUCTOR_P(arg));
> _bool("  completeconstructor",
> DECL_COMPLETE_CONSTRUCTOR_P(arg));
> _bool("  completedestructor",
> DECL_COMPLETE_DESTRUCTOR_P(arg));
> _bool("  operator", DECL_OVERLOADED_OPERATOR_P(arg));
> _bool("  castoperator", DECL_CONV_FN_P(arg));
> _bool("  thunk", DECL_THUNK_P(arg));
> _bool("  nothrow", TYPE_NOTHROW_P(TREE_TYPE(arg)));
> #endif
> 
> However, trying to run it on a C file like
> 
> struct __attribute__((compex_tag("x" foo { int x; };
> int main(int argc, char **argv) { return 0; }
> 
> results in a segfault at this line in `_finish_type`:
> 
>   const char *struct_name = decl ?
> IDENTIFIER_POINTER(DECL_NAME(decl)) : NULL;
> 
> It appears that the pointer DECL_NAME(decl) is corrupt, 

Maybe DECL_NAME(decl) is NULL?

> but I can't
> figure out why that is. I'm no expert on writing GCC plugins and this
> was an amateur attempt, I'm not sure I ever tested it with C.
> 
> But it still has a problem...
> 
> Has this functionality been used for the C compiler, if not, how can
> I
> debug this the easiest way?

Yes, plugins do work with cc1 (I first got into GCC development by
writing plugins, and most of my work was on analyzing C code)

I wrote a guide to debugging GCC which you may find helpful:
https://dmalcolm.fedorapeople.org/gcc/newbies-guide/debugging.html

FWIW it's possible to write a plugin that will work with both the C and
C++ frontends while accessing FE-specific things like cp_global_trees
without rebuilding, by using weak symbols.  This is something of a
hack, but I've used it successfully in a few places in my gcc-python-
plugin.  See e.g.:
https://github.com/davidmalcolm/gcc-python-plugin/blob/master/gcc-python-tree.c#L51
for an example of using decl_as_string from the C++ FE if available,
and it being NULL in other FEs.

Hope this is helpful; good luck.
Dave


> 
> Kind regards,
> 
> Maurice Smulders
> 
> On Fri, Apr 24, 2020 at 9:17 AM Maurice Smulders
>  wrote:
> > Hello,
> > 
> > Is it possible to make plugins for the C compiler (not the C++)
> > compiler? I was trying the (old) sample code at
> > https://github.com/hlandau/compex to make a plugin, but the plugin
> > only works with C++. when trying to use the C compiler it complains
> > about
> > 
> >  gcc -fplugin=/usr/local/lib/compex_gcc.so -D__COMPEX__=1
> > -fplugin-arg-compex_gcc-o=test_c.compex  -o test_c test_c.c
> > cc1: error: cannot load plugin /usr/local/lib/compex_gcc.so
> >/usr/local/lib/compex_gcc.so: undefined symbol: cp_global_trees
> > 
> > Is the mechanism differently, or are plugins even supported on the
> > C compiler?




Re: Help porting a plugin to more recent GCC

2020-05-12 Thread David Malcolm via Gcc
On Tue, 2020-05-12 at 11:12 +0200, Sebastian Kürten wrote:
> Hi everybody,
> 
> I'm trying to adapt an existing, open source GCC plugin so that it
> will
> work with more recent versions of GCC (it is currently working with
> 4.7
> only). During my research I came across your suggestion on the
> Wiki[1]
> to get in touch if one has any questions concerning developing
> plugins,
> so I'll try this and see if anybody would be so kind to give me a
> little
> guidance!
> 
> The plugin is GCC-Bridge of the Renjin project which has been
> discussed
> on this mailing list before[2]. It is part of an effort to create a
> JVM
> runtime for the R language. The GCC-Brigde plugin compiles Gimple to
> JVM
> bytecode to make that runtime possible. The original project lives
> here[3], however, I have created a fork[4] that concentrates on just
> the part that compiles C code to JVM bytecode. The plugin is
> currently
> written using the plugin API of GCC 4.7. Since 4.7 is not available
> on
> my current Ubuntu-based system any longer, I would like to migrate to
> a
> newer version. 4.8 is available on my system, so migrating to that
> version would suffice as a first step. I tried that, however
> compilation fails using gcc-4.8 and after some reading the docs and
> going through the GCC source code history it seems that 4.7 to 4.8
> was
> a rather big evolution.
> 
> If anyone wants to take a look at the error messages, I created a
> branch[5] that has everything set up so that you can just run the
> compiler and see what happens; the README[6] file contains the
> necessary compilation instructions. It also shows the current output
> of
> gcc-4.8 and the error messages it produces. The plugin consists of a
> single file[7]. It seems that a global variable called
> "varpool_nodes"
> is not available anymore and also the members of the struct
> varpool_node changed.

4.8 is rather ancient at this point.

Looking at gcc/ChangeLog-2012 I see a change 

  2012-04-16  Jan Hubicka  

in which, if I'm reading it right, varpool_nodes was removed in favor
of a symtab_nodes function (combining both variables and callgraph
nodes).  In later releases (I think) they got encapsulated into a
symbtab class.

There's also a FOR_EACH_VARIABLE macro that might give you what you
need.

>  I haven't been able to figure out a way to
> traverse the Gimple tree and data structures the way the plugin did
> with the older API. If anyone here is familiar with the changes to
> the
> plugin API from 4.7 to 4.8, maybe you have a few hints for me?
> Pointers
> to a different plugin that went through a migration from 4.7 to a
> newer
> version could also be very very helpful. Any ideas?

My gcc-python-plugin attempts to support gcc 4.6 onwards from one
source tree, so has a lot of nasty compatibility cruft:
  https://github.com/davidmalcolm/gcc-python-plugin
which might be helpful (or might not).

Hope this is constructive
Dave


> 
> Thank you!
> Sebastian
> 
> [1] https://gcc.gnu.org/wiki/plugins
> [2] https://gcc.gnu.org/legacy-ml/gcc/2016-02/msg4.html
> [3] https://github.com/bedatadriven/renjin/
> [4] https://github.com/mobanisto/gcc-bridge
> [5] https://github.com/mobanisto/gcc-bridge/tree/gcc-4.8
> [6] https://github.com/mobanisto/gcc-bridge/blob/gcc-4.8/README.md
> [7] 
> https://github.com/mobanisto/gcc-bridge/blob/gcc-4.8/compiler/src/main/resources/org/renjin/gcc/plugin.c
> 



Re: New mklog script

2020-05-15 Thread David Malcolm via Gcc
On Fri, 2020-05-15 at 10:59 +0200, Martin Liška wrote:
> Hi.
> 
> Since we moved to git world and we're in the preparation for
> ChangeLog messages
> being in git commit messages, I think it's the right time to also
> simplify mklog
> script.
> 
> I'm sending a new version (which should eventually replace
> contrib/mklog and contrib/mklog.pl).
> Changes made in the version:
> 
> - the script uses unifdiff - it rapidly simplifies parsing of the '+-
> !' lines that is done
>in contrib/mklog
> - no author nor date stamp is used - that all can be get from git
> - --inline option is not supported - I don't see a use-case for it
> now
> - the new script has a unit tests (just few of them for now)
> 
> I compares results in between the old Python script for last 80
> commits and it's very close,
> in some cases it does even better.
> 
> I'm planning to maintain and improve the script for the future.
> 
> Thoughts?
> Martin

> +class TestMklog(unittest.TestCase):
> +def test_macro_definition(self):
> +changelog = generate_changelog(PATCH1)
> +assert changelog == EXPECTED1
> +
> +def test_changed_argument(self):
> +changelog = generate_changelog(PATCH2)
> +assert changelog == EXPECTED2
> +
> +def test_enum_and_struct(self):
> +changelog = generate_changelog(PATCH3)
> +assert changelog == EXPECTED3
> +
> +def test_no_function(self):
> +changelog = generate_changelog(PATCH3, True)
> +assert changelog == EXPECTED3B

Use self.assertEqual(a, b) rather than assert a == b, so that if it
fails you get a multiline diff:

e.g.:

import unittest

class TestMklog(unittest.TestCase):
def test_macro_definition(self):
self.assertEqual('''
first
second
third''', '''
first
SECOND
third''')

unittest.main()


has this output:

F
==
FAIL: test_macro_definition (__main__.TestMklog)
--
Traceback (most recent call last):
  File "/tmp/foo.py", line 11, in test_macro_definition
third''')
AssertionError: '\nfirst\nsecond\nthird' != '\nfirst\nSECOND\nthird'
  
  first
- second
+ SECOND
  third

--
Ran 1 test in 0.000s

FAILED (failures=1)

which is much easier to debug than the output from assert a == b, which
is just:

F
==
FAIL: test_macro_definition (__main__.TestMklog)
--
Traceback (most recent call last):
  File "/tmp/foo.py", line 11, in test_macro_definition
third''')
AssertionError

--
Ran 1 test in 0.000s

FAILED (failures=1)



Re: New mklog script

2020-05-15 Thread David Malcolm via Gcc
On Fri, 2020-05-15 at 13:20 +0200, Martin Liška wrote:
> On 5/15/20 12:58 PM, David Malcolm wrote:
> > On Fri, 2020-05-15 at 10:59 +0200, Martin Liška wrote:
> > > Hi.
> > > 
> > > Since we moved to git world and we're in the preparation for
> > > ChangeLog messages
> > > being in git commit messages, I think it's the right time to also
> > > simplify mklog
> > > script.
> > > 
> > > I'm sending a new version (which should eventually replace
> > > contrib/mklog and contrib/mklog.pl).
> > > Changes made in the version:
> > > 
> > > - the script uses unifdiff - it rapidly simplifies parsing of the
> > > '+-
> > > !' lines that is done
> > > in contrib/mklog
> > > - no author nor date stamp is used - that all can be get from git
> > > - --inline option is not supported - I don't see a use-case for
> > > it
> > > now
> > > - the new script has a unit tests (just few of them for now)
> > > 
> > > I compares results in between the old Python script for last 80
> > > commits and it's very close,
> > > in some cases it does even better.
> > > 
> > > I'm planning to maintain and improve the script for the future.
> > > 
> > > Thoughts?
> > > Martin
> > > +class TestMklog(unittest.TestCase):
> > > +def test_macro_definition(self):
> > > +changelog = generate_changelog(PATCH1)
> > > +assert changelog == EXPECTED1
> > > +
> > > +def test_changed_argument(self):
> > > +changelog = generate_changelog(PATCH2)
> > > +assert changelog == EXPECTED2
> > > +
> > > +def test_enum_and_struct(self):
> > > +changelog = generate_changelog(PATCH3)
> > > +assert changelog == EXPECTED3
> > > +
> > > +def test_no_function(self):
> > > +changelog = generate_changelog(PATCH3, True)
> > > +assert changelog == EXPECTED3B
> 
> Thank you David for review.
> 
> However I see the same output for both operator== and assertEqual.
> Probably
> because of usage of pytest version 4?

Ah, yes.  pytest does "magical" things with frame inspection IIRC to
scrape the locals out of the failing python stack frame.

Dave



Re: ERR: file not changed in a patch:"gcc/cp/cp-tree.c"

2020-05-19 Thread David Malcolm via Gcc
On Tue, 2020-05-19 at 13:03 -0600, Martin Sebor via Gcc wrote:
> I'm having trouble with the commit hook that tries to enforce
> ChangeLog contents.  It fails with an error that doesn't make
> sense to me: the file it complains isn't mentioned clearly is
> listed there and I can't tell what about how it's mentioned
> the hook is having a problem with.

If it's a bug in the hook, it would probably be helpful if you posted
the commit that it refused to accept.

It looks to me like when the hook parsed the patch part of your commit
that it erroneously decided the hunks changing gcc/cp/cp-tree.c were
actually to "gcc/cp/tree.c" - but that's guesswork on my part.

Hope this is helpful
Dave

> Thanks
> Martin
> 
> $ git push
> Enumerating objects: 23, done.
> Counting objects: 100% (23/23), done.
> Delta compression using up to 16 threads
> Compressing objects: 100% (12/12), done.
> Writing objects: 100% (12/12), 2.41 KiB | 2.41 MiB/s, done.
> Total 12 (delta 11), reused 0 (delta 0)
> remote: *** ChangeLog format failed:
> remote: ERR: file not changed in a patch:"gcc/cp/cp-tree.c"
> remote: ERR: changed file not mentioned in a
> ChangeLog:"gcc/cp/tree.c"
> remote:
> remote: Please see: 
> https://gcc.gnu.org/codingconventions.html#ChangeLogs
> remote:
> remote: error: hook declined to update refs/heads/master
> To git+ssh://gcc.gnu.org/git/gcc.git
>   ! [remote rejected] master -> master (hook declined)
> error: failed to push some refs to 
> 'git+ssh://mse...@gcc.gnu.org/git/gcc.git'
> 
> $ head gcc/cp/ChangeLog
> 2020-05-18  Martin Sebor  
> 
>   PR c++/94923
>   * call.c ((maybe_warn_class_memaccess): Use
> is_byte_access_type.
>   * cp-tree.h (is_dummy_object): Return bool.
>   (is_byte_access_type): Declare new function.
>   * cp-tree.c (is_dummy_object): Return bool.
>   (is_byte_access_type): Define new function.
> 
> 2020-05-19  Patrick Palka  
> 



Re: Passing an string argument to a GIMPLE call

2020-06-27 Thread David Malcolm via Gcc
On Sat, 2020-06-27 at 21:27 +0800, Shuai Wang via Gcc wrote:
> Dear Richard,
> 
> Thanks for the info. My bad, I will need to append "\0" at the end of
> the
> string. Also, a follow-up question which I just cannot find an
> answer:
> typically in the plugin entry point:
> 
> virtual unsigned int execute(function *fun)
> 
> How do I know which C files I am instrumenting? Can I somehow get the
> name
> of the C file? I don't find a corresponding pointer in the function
> struct.

fun->function_start_locus and fun->function_end_locus are the
location_t for the start and end of the function; also, each gimple
stmt has a location_t (although this isn't always set for every stmt).

Given a location_t, you can use LOCATION_FILE (loc) to get the source
file (and various other macros and accessors, see input.h)

Hope this is helpful
Dave

> Best,
> Shuai
> 
> On Sat, Jun 27, 2020 at 9:12 PM Richard Biener <
> richard.guent...@gmail.com>
> wrote:
> 
> > On June 27, 2020 6:21:12 AM GMT+02:00, Shuai Wang via Gcc <
> > gcc@gcc.gnu.org>
> > wrote:
> > > Hello,
> > > 
> > > I am writing the following statement to make a GIMPLE call:
> > > 
> > >  tree function_fn_type =
> > > build_function_type_list(void_type_node,
> > > void_type_node, integer_type_node, NULL_TREE);
> > >  tree sancov_fndecl =
> > > build_fn_decl("my_instrumentation_function",
> > > function_fn_type);
> > > 
> > > auto gcall = gimple_build_call(sancov_fndecl, 2,
> > > build_string_literal(3, "foo"),
> > > build_int_cst_type(integer_type_node,
> > > 0));
> > > 
> > > However, when executing the GIMPLE plugin, while inducing no
> > > internal
> > > crash, the following function call statement is generated:
> > > 
> > >  my_instrumentation_function (*&"foo"[0]*, 0);
> > > 
> > > The first argument seems really strange. Can I somewhat just put
> > > a
> > > "foo"
> > > there instead of the current form? Thank you very much.
> > 
> > It looks correct. You are passing the address of the string
> > literal.
> > 
> > Richard.
> > 
> > > Best,
> > > Shuai



Re: gcc-backport problem on Debian 9

2020-07-13 Thread David Malcolm via Gcc
On Mon, 2020-07-13 at 08:39 +0200, Hans-Peter Nilsson via Gcc wrote:
> Again, Debian 9.  Doing "git gcc-backport a4aca1edaf37d43" on
> releases/gcc-10 gave me:
> 
> [releases/gcc-10 83cf5a7c6a5] PR94600: fix volatile access to the
> whole of a compound object.
>  Date: Sun Jul 5 20:50:52 2020 +0200
>  9 files changed, 276 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr94600-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr94600-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr94600-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr94600-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr94600-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr94600-6.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr94600-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr94600-8.c
> Traceback (most recent call last):
>   File "/mnt/storage1/hp/autotest/gccgit/gcc/contrib/git-
> backport.py", line 34, in 
> changelogs = subprocess.check_output(cmd, shell=True,
> encoding='utf8')
>   File "/usr/lib/python3.5/subprocess.py", line 316, in check_output
> **kwargs).stdout
>   File "/usr/lib/python3.5/subprocess.py", line 383, in run
> with Popen(*popenargs, **kwargs) as process:
> TypeError: __init__() got an unexpected keyword argument 'encoding'

This is subprocess.Popen.__init__, which is part of the Python standard
library.

https://docs.python.org/3.6/library/subprocess.html#popen-constructor
states:
  "New in version 3.6: 'encoding' and 'errors' were added."

Hence it looks like git-backport.py is implicitly assuming Python 3.6
or later.


> The commit looked fine with a "(cherry picked from commit
> a4aca1edaf37d43b2b7e9111825837a7a317b1b0)", appended to the
> commit log, so I pushed it successfully (using git am on the
> format-patch of this commit on another machine, so the sha above
> is not the final one, but 6f49c66ed4e060c333d8bcd).
> 
> Not sure what other information is needed, but maybe:
> 
> $ dpkg -s python3
> Package: python3
> Status: install ok installed
> Priority: optional
> Section: python
> Installed-Size: 67
> Maintainer: Matthias Klose 
> Architecture: amd64
> Multi-Arch: allowed
> Source: python3-defaults
> Version: 3.5.3-1
> Replaces: python3-minimal (<< 3.1.2-2)
> Provides: python3-profiler
> Depends: python3.5 (>= 3.5.3-1~), libpython3-stdlib (= 3.5.3-1), dh-
> python
> Pre-Depends: python3-minimal (= 3.5.3-1)
> Suggests: python3-doc (>= 3.5.3-1), python3-tk (>= 3.5.3-1~),
> python3-venv (>= 3.5.3-1)
> Description: interactive high-level object-oriented language (default
> python3 version)
>  Python, the high-level, interactive object oriented language,
>  includes an extensive class library with lots of goodies for
>  network programming, system administration, sounds and graphics.
>  .
>  This package is a dependency package, which depends on Debian's
> default
>  Python 3 version (currently v3.5).
> Homepage: http://www.python.org/
> 
> FWIW, I manually did "apt-get install python3-unidiff" and
> "apt-get install python3-dateutil" to deal with missing packages
> in other related scripts.  Perhaps this is a different
> incantation.  Are the dependencies listed somewhere?
> 
> brgds, H-P
> 



Re: Three issues

2020-07-22 Thread David Malcolm via Gcc
On Tue, 2020-07-21 at 22:49 +, Gary Oblock via Gcc wrote:
> Some background:
> 
> This is in the dreaded structure reorganization optimization that I'm
> working on. It's running at LTRANS time with '-flto-partition=one'.
> 
> My issues in order of importance are:
> 
> 1) In gimple-ssa.h, the equal method for ssa_name_hasher
> has a segfault because the "var" field of "a" is (nil).
> 
> struct ssa_name_hasher : ggc_ptr_hash
> {
>   /* Hash a tree in a uid_decl_map.  */
> 
>   static hashval_t
>   hash (tree item)
>   {
> return item->ssa_name.var->decl_minimal.uid;
>   }
> 
>   /* Return true if the DECL_UID in both trees are equal.  */
> 
>   static bool
>   equal (tree a, tree b)
>   {
>   return (a->ssa_name.var->decl_minimal.uid == b->ssa_name.var-
> >decl_minimal.uid);
>   }
> };

I notice that tree.h has:

/* Returns the variable being referenced.  This can be NULL_TREE for
   temporaries not associated with any user variable.
   Once released, this is the only field that can be relied upon.  */
#define SSA_NAME_VAR(NODE)  \
  (SSA_NAME_CHECK (NODE)->ssa_name.var == NULL_TREE \
   || TREE_CODE ((NODE)->ssa_name.var) == IDENTIFIER_NODE   \
   ? NULL_TREE : (NODE)->ssa_name.var)

So presumably that ssa_name_hasher is making an implicit assumption
that such temporaries aren't present in the hash_table; maybe they are
for yours?

Is this a hash_table that you're populating yourself?

With the caveat that I'm sleep-deprived, another way this could happen
is if "a" is not an SSA_NAME but is in fact some other kind of tree;
you could try replacing
  a->ssa_name.ver
with
  SSA_NAME_CHECK (a)->ssa_name.var
(and similarly for b)

But the first explanation seems more likely.


> 
[...snip qn 2...]


> 3) For my bug in (1) I got so distraught that I ran valgrind which
> in my experience is an act of desperation for compilers.
> 
> None of the errors it spotted are associated with my optimization
> (although it oh so cleverly pointed out the segfault) however it
> showed the following:
> 
> ==18572== Invalid read of size 8
> ==18572==at 0x1079DC1: execute_one_pass(opt_pass*)
> (passes.c:2550)

What is line 2550 of passes.c in your working copy?

==18572==by 0x107ABD3: execute_ipa_pass_list(opt_pass*)
> (passes.c:2929)
> ==18572==by 0xAC0E52: symbol_table::compile() (cgraphunit.c:2786)
> ==18572==by 0x9915A9: lto_main() (lto.c:653)
> ==18572==by 0x11EE4A0: compile_file() (toplev.c:458)
> ==18572==by 0x11F1888: do_compile() (toplev.c:2302)
> ==18572==by 0x11F1BA3: toplev::main(int, char**) (toplev.c:2441)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==  Address 0x5842880 is 16 bytes before a block of size 88
> alloc'd
> ==18572==at 0x4C3017F: operator new(unsigned long) (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18572==by 0x21E00B7: make_pass_ipa_prototype(gcc::context*)
> (ipa-prototype.c:329)

You say above that none of the errors are associated with your
optimization, but presumably this is your new pass, right?  Can you
post the code somewhere?

> ==18572==by 0x106E987:
> gcc::pass_manager::pass_manager(gcc::context*) (pass-
> instances.def:178)
> ==18572==by 0x11EFCE8: general_init(char const*, bool)
> (toplev.c:1250)
> ==18572==by 0x11F1A86: toplev::main(int, char**) (toplev.c:2391)
> ==18572==by 0x23C021E: main (main.c:39)
> ==18572==
> 
> Are these known issues with lto or is this a valgrind issue?

Hope this is helpful
Dave



The encoding of GCC's stderr

2020-11-17 Thread David Malcolm via Gcc
As far as I can tell, GCC's diagnostic output on stderr is a mixture of
bytes from various different places in our internal representation:
- filenames
- format strings from diagnostic messages (potentially translated via
.po files)
- identifiers
- quoted source code
- fix-it hints
- labels

As noted in https://gcc.gnu.org/onlinedocs/cpp/Character-sets.html
source files can be in any character set, specified by -finput-charset=, and 
libcpp converts that to the "source character set", Unicode, encoding it 
internally as UTF-8.  String and character constants are then converted to the 
execution character set (defaulting to UTF-8-encoded Unicode).  In many places 
we use identifier_to_locale to convert from the "internal encoding" to the 
locale character set, falling back to converting non-ASCII characters to UCNs.  
I suspect that there are numerous places where we're not doing that, but ought 
to be.

The only test coverage I could find for -finput-charset is
gcc.dg/ucnid-16-utf8.c, which has a latin1-encoded source file, and
verifies that a latin-1 encoded variable name becomes UTF-8 encoded in
the resulting .s file.  I shudder to imagine a DejaGnu test for a
source encoding that's not a superset of ASCII (e.g. UCS-4) - how would
the dg directives be handled?  I wonder if DejaGnu can support tests in
which the compiler's locale is overridden with environment variables
(and thus having e.g. non-ASCII/non-UTF-8 output).

What is the intended encoding of GCC's stderr?

In gcc_init_libintl we call:

#if defined HAVE_LANGINFO_CODESET
  locale_encoding = nl_langinfo (CODESET);
  if (locale_encoding != NULL
  && (!strcasecmp (locale_encoding, "utf-8")
  || !strcasecmp (locale_encoding, "utf8")))
locale_utf8 = true;
#endif

so presumably stderr ought to be nl_langinfo (CODESET).

We use the above to potentially use the UTF-8 encoding of U+2018 and
U+2019 for open/close quotes, falling back to ASCII for these.

As far as I can tell, we currently:
- blithely accept and emit filenames as bytes (I don't think we make
any attempt to enforce that they're any particular encoding)
- emit format strings in whatever encoding gettext gives us
- emit identifiers as char * from IDENTIFIER_POINTER, calling
identifier_to_locale on them in many places, but I suspect we're
missing some
- blithely emit quoted source code as raw bytes (this is PR
other/93067, which has an old patch attached; presumably the source
ought to be emitted to stderr in the locale encoding)
- fix-it hints can contain identifiers as char * from
IDENTIFIER_POINTERs, which is likely UTF-8; I think I'm failing to call
identifier_to_locale on them
- labels can contain type names, which are likely UTF-8, and I'm
probably failing to call identifier_to_locale on them

So I think our current policy is:
- we assume filenames are encoded in the locale encoding, and pass them
through as bytes with no encode/decode
- we emit to stderr in the locale encoding (but there are likely bugs
where we don't re-encode from UTF-8 to the locale encoding)

Does this sound correct?

My motivation here is the discussion in [1] and [2] of supporting Emacs
via an alternative output format for machine-readable fix-it hints,
which has made me realize that I didn't understand our current approach
to encodings as well as I would like.

Hope this is constructive
Dave

[1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=25987
[2] https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559105.html



Re: GCC GSoC 2021 - Static analyzer project

2021-01-18 Thread David Malcolm via Gcc
On Thu, 2021-01-14 at 10:45 +0530, Adharsh Kamath wrote:
> Hello,
> I came across the list of possible project ideas for GSoC 2021 and
> I'd
> like to contribute to the project regarding the static analysis pass
> in GCC.
> How can I get started with this project?

Hi Adharsh

Sorry about the delay in responding to your email.

Thanks for your interest in the static analysis pass.

Some ideas on getting started with GCC are here:
  https://gcc.gnu.org/wiki/SummerOfCode#Before_you_apply

The analyzer has its own wiki page here:
  https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer

I've actually already implemented some of the ideas that were on the
GSoC wiki page myself since last summer, so I've updated that page
accordingly:
  https://gcc.gnu.org/wiki/SummerOfCode?action=diff&rev2=187&rev1=184
I've added the idea of SARIF ( https://sarifweb.azurewebsites.net/ ) as
an output format for the static analyzer (and indeed, for the GCC
diagnostics subsystem as a whole).

Do any of the ideas on the page look appealing to you?  I'm open to
other ideas you may have relating to the analyzer, or indeed to gcc
diagnostics.

There's no shortage of things to work on in the analyzer (e.g. C++
support, etc).

Thoughts?

Thanks
Dave




Re: GCC GSoC 2021 - Static analyzer project

2021-01-22 Thread David Malcolm via Gcc
On Fri, 2021-01-22 at 20:46 +0530, Adharsh Kamath wrote:
> Hi David. Thank you for the reply.
> On Tue, Jan 19, 2021 at 2:12 AM David Malcolm 
> wrote:
> > On Thu, 2021-01-14 at 10:45 +0530, Adharsh Kamath wrote:
> > > Hello,
> > > I came across the list of possible project ideas for GSoC 2021
> > > and
> > > I'd
> > > like to contribute to the project regarding the static analysis
> > > pass
> > > in GCC.
> > > How can I get started with this project?
> > 
> > Hi Adharsh
> > 
> > Sorry about the delay in responding to your email.
> > 
> > Thanks for your interest in the static analysis pass.
> > 
> > Some ideas on getting started with GCC are here:
> >   https://gcc.gnu.org/wiki/SummerOfCode#Before_you_apply
> > 
> > The analyzer has its own wiki page here:
> >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> 
> I examined the analyzer dumps for a few programs. I also read the
> documentation on the internals of
> the static analyzer and I've understood the basics of how the
> analyzer works.

Excellent.  Building GCC from source and stepping through it in the
debugger would be good next steps.  You'll need plenty of disk space.
 "run_checkers" is a good breakpoint to set if you're looking for the
entrypoint to the analyzer.

> > I've actually already implemented some of the ideas that were on
> > the
> > GSoC wiki page myself since last summer, so I've updated that page
> > accordingly:
> >   
> > https://gcc.gnu.org/wiki/SummerOfCode?action=diff&rev2=187&rev1=184
> > I've added the idea of SARIF ( https://sarifweb.azurewebsites.net/
> > ) as
> > an output format for the static analyzer (and indeed, for the GCC
> > diagnostics subsystem as a whole).
> > 
> > Do any of the ideas on the page look appealing to you?  I'm open to
> > other ideas you may have relating to the analyzer, or indeed to gcc
> > diagnostics.
> 
> Yes. Making a plugin for the Linux kernel seems very interesting to
> me.
> I'd also like to extend support for C++ but I'm not sure if both
> ideas would be
> possible, given the time constraints.

I think that picking just one would be better than trying to do both.

> How do I start with the plugin for
> the Linux kernel?

I added plugin support to the analyzer in:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=66dde7bc64b75d4a338266333c9c490b12d49825

There's an example plugin in that patch.  The kernel source tree
already has some plugins, so hopefully those together give some
pointers on how to write a "hello world" analyzer plugin that runs as
part of the kernel build, which would be another next step, I guess.

Unfortunately I'm not a Linux kernel developer, so I don't have deep
knowledge of what checks would be useful and the subtle details that
are likely to be necessary.  I'll try to reach out internally within
Red Hat - we have plenty of kernel developers here.

Some ideas:
* detecting code paths that acquire a lock but then fail to release it
* detecting code paths that disable interrupts and then fail to re-
enable them
* detecting mixups between user-space pointers and kernel-space
pointers

The kernel has its own checker called "smatch" which may give other
ideas for warnings.

The state machine checker in the analyzer takes its inspiration from
the Stanford "MC" checker (among other places, such as typestate),
which has been used to implement warnings for the Linux kernel, albeit
some very old versions of the kernel.

See::
  * "How to write system-specific, static checkers in Metal" (Benjamin
Chelf, Dawson R Engler, Seth Hallem), from 2002
  * "Checking system rules using system-specific, programmer-written
compiler extensions" Proceedings of Operating Systems Design and
Implementation (OSDI), September 2000. D. Engler, B. Chelf, A. Chou,
and S. Hallem.
  * "Using Programmer-Written Compiler Extensions to Catch Security
Holes" (Ken Ashcraft, Dawson Engler) from 2002

These are working on 20-year-old in-kernel APIs that might be obsolete
now, but they have examples of interrupt checking, and user-space vs
kernel-space pointer checking.

Focusing on error-handling paths in driver code might be best.

Does this answer your questions?

Hope this sounds interesting as a project
Dave




Static analysis updates in GCC 11

2021-01-28 Thread David Malcolm via Gcc
I wrote a blog post covering what I've been working on in the analyzer
in this release:
 
https://developers.redhat.com/blog/2021/01/28/static-analysis-updates-in-gcc-11/

Hope this is of interest
Dave




Re: Static analysis updates in GCC 11

2021-01-28 Thread David Malcolm via Gcc
On Thu, 2021-01-28 at 22:06 +0100, David Brown wrote:
> On 28/01/2021 21:23, David Malcolm via Gcc wrote:
> > I wrote a blog post covering what I've been working on in the
> > analyzer
> > in this release:
> >  
> > https://developers.redhat.com/blog/2021/01/28/static-analysis-updates-in-gcc-11/
> > 
> 
> As a gcc user, I am always glad to hear of more static analysis and
> static warning work.  My own work is mostly on small embedded
> systems,
> where "malloc" and friends are severely frowned upon in any case and
> there is no file system, so most of the gcc 10 -fanalyzer warnings
> are
> of no direct use to me.  (I still think they are great ideas - even
> if
> /I/ don't write much PC code, everyone benefits if there are fewer
> bugs
> in programs.)  I will get more use for the new warnings you've added
> for
> gcc 11.
> 
> 
> I wrote a feature request for gcc a while back, involving adding tag
> attributes to functions in order to ensure that certain classes of
> functions are only used from specific allowed functions.  The feature
> request attracted only a little interest at the time.  But I suspect
> it
> could work far better along with the kind of analysis you are doing
> with
> -fanalyzer than with the normal syntactical analyser in gcc.
> 
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88391>

Interesting.  The attribute ideas seem designed to work with the
callgraph: partitioning the callgraph into families of functions for
which certain kinds of inter-partition edges are disallowed.  Can a
function change its tag internally, or is it assumed that a function
has a single tag throughout its whole body?  I see that you have a case
in example 3 where a compound statement is marked with an attribute
(which may be an extension of our syntax).

One thing I forgot to mention in the blog post is that the analyzer now
supports plugins; there's an example of a mutex-checking plugin here:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=66dde7bc64b75d4a338266333c9c490b12d49825
which is similar to your examples 1 and 3.  Your example 2 is also
reminiscent of the async-signal-unsafe checking that the analyzer has
(where it detects code paths that are called within a signal handler
and complains about bad calls within them).  Many of the existing
checks in the analyzer are modelled as state machines (either global
state for things like "are we in a signal handler", or per-value state
for things like "has this pointer been freed"), and your examples could
be modelled that way too (e.g. "what sections are in RAM" could be a
global state) - so maybe it could all be done as analyzer plugins, in
lieu of implementing the RFE you posted.

Hope this is constructive
Dave



Re: GCC GSoC 2021 - Static analyzer project

2021-02-11 Thread David Malcolm via Gcc
On Thu, 2021-02-11 at 22:35 +0530, Adharsh Kamath wrote:
> Hi David,
> 
> > Building GCC from source and stepping through it in the
> > debugger would be good next steps.  You'll need plenty of disk
> > space.
> >  "run_checkers" is a good breakpoint to set if you're looking for
> > the
> > entrypoint to the analyzer.
> > 
> 
> I tried this and I understood the control flow in the analyzer.
> 
> > There's an example plugin in that patch.  The kernel source tree
> > already has some plugins, so hopefully, those together give some
> > pointers on how to write a "hello world" analyzer plugin that runs
> > as
> > part of the kernel build, which would be another next step, I
> > guess.
> > 
> 
> I implemented a very simple hello world plugin here -
> https://github.com/adharshkamath/Hello-world-plugin.
> 
> It just prints a Hello message while building the Linux Kernel, if
> the
> -fanalyzer option is enabled. I referred to the example plugin in the
> static analyzer
> and the plugins in the kernel source to do this.

Excellent.

> > See::
> >   * "How to write system-specific, static checkers in Metal"
> > (Benjamin
> > Chelf, Dawson R Engler, Seth Hallem), from 2002
> >   * "Checking system rules using system-specific, programmer-
> > written
> > compiler extensions" Proceedings of Operating Systems Design and
> > Implementation (OSDI), September 2000. D. Engler, B. Chelf, A.
> > Chou,
> > and S. Hallem.
> >   * "Using Programmer-Written Compiler Extensions to Catch Security
> > Holes" (Ken Ashcraft, Dawson Engler) from 2002
> > 
> 
> These were useful and interesting to read. Thank you for suggesting
> them.
> Adharsh

Great.

I believe you're in a position to write a strong application to GSoC
for yourself for this project; you're well ahead of the timeline, as I
understand it:
  https://summerofcode.withgoogle.com/how-it-works/#timeline

Dave



Re: using undeclared function returning bool results in wrong return value

2021-02-20 Thread David Malcolm via Gcc
On Sat, 2021-02-20 at 15:25 +0100, David Brown wrote:
> On 19/02/2021 12:18, Jonathan Wakely via Gcc wrote:
> > On Fri, 19 Feb 2021 at 09:42, David Brown wrote:
> > > Just to be clear - I am not in any way suggesting that this
> > > situation is
> > > the fault of any gcc developers.  If configure scripts are
> > > failing
> > > because they rely on poor C code or inappropriate use of gcc
> > > (code that
> > > requires a particular C standard should specify it - gcc has the
> > > "-std="
> > > flags for that purpose), then the maintainers of those scripts
> > > should
> > > fix them.  If Fedora won't build just because the C compiler
> > > insists C
> > > code is written in C, then the Fedora folk need to fix their
> > > build system.
> > 
> > It's not Fedora's build system, it's the packages in Fedora's build
> > systems. Lots of them. And those same packages are in every other
> > Linux distro, so everybody needs to fix them.
> > 
> 
> It seems to me that there are two very different uses of gcc going on
> here.  (I'm just throwing up some ideas here - if people think they
> are
> daft, wrong or impractical, feel free to throw them out again!  I am
> trying to think of ways to make it easier for people to see that
> there
> are problems with their C or C++ code, without requiring impractical
> changes on large numbers of configuration files and build setups.)
> 
> gcc can be used as a development tool - it is an aid when writing
> code,
> and helps you write better code.  Here warnings of all sorts are
> useful,
> as it is better to find potential or real problems as early as
> possible
> in the development process.  Even warnings about style are important
> because they improve the long-term maintainability of the code.
> 
> gcc can also be used to build existing code - for putting together
> distributions, installing on your own machine, etc.  Here flags such
> as
> "-march=native" can be useful but non-critical warnings are not,
> because
> the person (or program) running the compiler is not a developer of
> the
> code.  This use is as a "system C compiler".

I think there's an important insight here, in that there's a
distinction between:

(a) the edit-compile-debug cycle where the user is actively hacking on
the code themself (perhaps a project they wrote, or someone else's),
where they just made a change to the code and want to see what
happens, 

as opposed to

(b) a batch rebuild setting, where the user is recompiling a package,
and GCC is a detail that's being invoked by a hierarachy of build
systems (e.g. a Fedora mass rebuild that invokes koji, that invokes
rpmbuild, that invokes some build tool, which eventually invokes gcc);
perhaps a dependency changed, and the user is curious about what breaks
(and hoping that nothing does, since they know nothing about this
particular code, maybe they're just trying to get the distro to boot on
some new architecture).

I think we need to think about both of these use-cases e.g. as we
implement our diagnostics, and that we should mention this distinction
in our UX guidelines...

> Is it possible to distinguish these uses, and then have different
> default flags?  Perhaps something as simple as looking at the name
> used
> to call the compiler - "cc" or "gcc" ?
> 

...but I'm wary of having an actual distinction between them in the
code; it seems like a way to complicate things and lead to "weird"
build failures.

Thought experiment: what might a "--this-is-my-code" option do?

Hope this is constructive
Dave



Constraints and branching in -fanalyzer

2021-02-20 Thread David Malcolm via Gcc
[Moving this discussion from offlist to the GCC mailing list (with
permission) and tweaking the subject]

On Sat, 2021-02-20 at 02:57 +, brian.sobulefsky wrote:
> Yeah, its a lot to take in. For the last one, it was just about
> storing and retrieving data and I ignored everything else about the
> analyzer, and that was hard enough.

Well done on making it this far; I'm impressed that you're diving into
some of the more difficult aspects of this code, and seem to be coping.

> I am working on PR94362, which originates from a false positive found
> compiling openssl. It effectivly amounted to not knowing that idx >=
> 0 within the loop for(; idx-- >0 ;).
> 
> It turns out there are two problems here. One has to do with the
> postfix operator, and yes, the analyzer currently does not know that
> i >= 0 within an if block like if(idx-- > 0). That problem was easy
> and I got it to work within a few days with a relatively simple
> patch. I thought I was going to be submitting again.
> 
> The second part is hard. It has to do with state merging and has
> nothing to do with the postfix operator. It fails for all sorts of
> operators when looping. In fact, the following fails:
> 
> if(idx < 0)
>   idx = 0;
> __analyzer_eval(idx >= 0);
> 
> which is devastating if you are hoping the analyzer can "understand"
> a loop. Even with my first fix (which gives one TRUE when run on a
> for loop), there is the second "iterated" pass, which ends up with a
> widening_svalue (I'm still trying to wrap my head around that one
> too), that gives an UNKNOWN

FWIW "widening" in this context is taken from abstract interpretation;
see e.g. the early papers by Patrick and Radhia Cousot; the idea is to
jump ahead of an infinitely descending chain of values to instead go
directly to a fixed point in a (small) finite number of steps.  (I've
not attempted the narrowing approach, which refines it further to get a
tighter approximation).


> So I am trying to follow how states are merged, but that means I need
> to at least sort of understand the graphs. I do know that the actual
> merging follows in the PK_AFTER_SUPERNODE branch, with the call to
> node->on_edge, which eventually gets you to maybe_update_for_edge and
> the for_each_fact iterator.

I have spent far too many hours poring over graph dumps from the
analyzer, and yes, grokking the state merging is painful, and I'm sure
there are many bugs.

Are you familiar with the various dump formats for the graph?  In
particular the .dot ones?  FWIW I use xdot.py for viewing them:
  https://github.com/jrfonseca/xdot.py
(and indeed am the maintainer of the Fedora package for it); it has a
relatively quick and scalable UI for navigating graphs, but at some
point even it can't cope.
I started writing a dedicated visualizer that uses some of xdot.py's
classes:
  https://github.com/davidmalcolm/gcc-analyzer-viewer
but it's early days for that.



> I watched a merge in the debugger yesterday for the if example above
> and watched the unknown_svalues be made for the merged state, but it
> was still too much to take in all at once for me to know where the
> solution is.

One other nasty problem with the state merging code is that any time I
touch it, there are knock-on effects where other things break (e.g.
loop analysis stops converging), and as I fix those, yet more things
break, which is demoralizing (3 steps forward, 2 steps back).

Finding ways to break problems down into smaller chunks seems to be the
key here.

It sounds like you're making progress with the:

  if (idx < 0)
 idx = 0;
  __analyzer_eval (idx >= 0);

case.  Does your fix at least work outside of a loop, without
regressing things? (Or, if it does, what regresses?)  If so, then it
could be turned into a minimal testcase and we could at least fix that.


FWIW I've experimented with better ways to handle loops in the
analyzer.  One approach is that GCC already has its own loop analysis
framework.  At the point where -fanalyzer runs, the IR has captured the
nesting structure of loops in the code, so we might want to make use of
that in our heuristics.  Sadly, as far as I can tell, any attempts to
go beyond that and reuse GCC's scalar-value-evolution code (for
handling loop bounds and iterations) require us to enable modification
of the CFGs, which is a no-no for -fanalyzer.

(Loop handling is one of the most important and difficult issues within
the analyzer implementation.  That said, in the last few days I've been
ignoring it, and have been focusing instead on a rewrite of how we find
the shortest feasibile path for each diagnostic, since there's another
cluster of analyzer bugs relating to false negatives in that; I hope to
land a big fix for feasibilty handling next week - and to finish
reviewing your existing patch (sorry!))

Hope this is helpful.  I'm quoting the rest of our exchange below (for
the mailing list)

Dave


> 
> Sent with ProtonMail Secure Email.
> 
> ‐‐‐ Original Message ‐‐‐
> On Friday, Februar

Re: Constraints and branching in -fanalyzer

2021-02-22 Thread David Malcolm via Gcc
On Sun, 2021-02-21 at 05:27 +, brian.sobulefsky wrote:
> To be clear, I only solved the lesser problem
> 
> if(idx-- > 0)
>   __analyzer_eval(idx >= 0);
> 
> which is a stepping stone problem. I correctly surmised that this was
> failing
> (with the prefix operator and -= operator working as expected)
> because the
> condition that is constrainted in the postfix problem is the old
> value for idx
> while the condition being evaluated is the new value. I can send you
> a patch,
> but the short version is the initial value of idx is constrained,
> then a binop_svalue
> is stored and eventually ends up in __analyzer_eval. Adding a case in
> constraint_manager::eval_condition to take apart binop svalues and
> recur
> the way you are imagining would be necessary is basically all that is
> needed
> to solve that one. Currently, the constraint_manager is just looking
> at
> that binop_svalue and determining it does not know any rules for it,
> because
> the rule it knows about is actually for one of its arguments.
> 
> I was hoping this would be it for the original loop problem, but like
> I said,
> it goes from saying "UNKNOWN" twice to saying "TRUE UNKNOWN" which I
> found out happens for the other operators in a for loop as well. The
> first
> true is my binop_svalue handler, but the second UNKNOWN is the
> merging of
> the bottom of the loop back with the entry point.
> 
> Since that is fairly abstract, when I found the case I told you
> about,
> I decided to see if I could fix it, because merging >0 with =0 into
> >=0
> for a linear CFG should not be too hard.

I think it's probably best if you post the patch that you have so far
(which as I understand it fixes the non-looping case), since it sounds
like a useful base to work from.

Thanks
Dave

> 
> ‐‐‐ Original Message ‐‐‐
> On Saturday, February 20, 2021 12:42 PM, David Malcolm <
> dmalc...@redhat.com> wrote:
> 
> > [Moving this discussion from offlist to the GCC mailing list (with
> > permission) and tweaking the subject]
> > 
> > On Sat, 2021-02-20 at 02:57 +, brian.sobulefsky wrote:
> > 
> > > Yeah, its a lot to take in. For the last one, it was just about
> > > storing and retrieving data and I ignored everything else about
> > > the
> > > analyzer, and that was hard enough.
> > 
> > Well done on making it this far; I'm impressed that you're diving
> > into
> > some of the more difficult aspects of this code, and seem to be
> > coping.
> > 
> > > I am working on PR94362, which originates from a false positive
> > > found
> > > compiling openssl. It effectivly amounted to not knowing that idx
> > > >=
> > > 0 within the loop for(; idx-- >0 ;).
> > > It turns out there are two problems here. One has to do with the
> > > postfix operator, and yes, the analyzer currently does not know
> > > that
> > > i >= 0 within an if block like if(idx-- > 0). That problem was
> > > easy
> > > and I got it to work within a few days with a relatively simple
> > > patch. I thought I was going to be submitting again.
> > > The second part is hard. It has to do with state merging and has
> > > nothing to do with the postfix operator. It fails for all sorts
> > > of
> > > operators when looping. In fact, the following fails:
> > > if(idx < 0)
> > >   idx = 0;
> > > __analyzer_eval(idx >= 0);
> > > which is devastating if you are hoping the analyzer can
> > > "understand"
> > > a loop. Even with my first fix (which gives one TRUE when run on
> > > a
> > > for loop), there is the second "iterated" pass, which ends up
> > > with a
> > > widening_svalue (I'm still trying to wrap my head around that one
> > > too), that gives an UNKNOWN
> > 
> > FWIW "widening" in this context is taken from abstract
> > interpretation;
> > see e.g. the early papers by Patrick and Radhia Cousot; the idea is
> > to
> > jump ahead of an infinitely descending chain of values to instead
> > go
> > directly to a fixed point in a (small) finite number of steps.
> > (I've
> > not attempted the narrowing approach, which refines it further to
> > get a
> > tighter approximation).
> > 
> > > So I am trying to follow how states are merged, but that means I
> > > need
> > > to at least sort of understand the graphs. I do know that the
> > > actual
> > > merging follows in the PK_AFTER_SUPERNODE branch, with the call
> > > to
> > > node->on_edge, which eventually gets you to maybe_update_for_edge
> > > and
> > > the for_each_fact iterator.
> > 
> > I have spent far too many hours poring over graph dumps from the
> > analyzer, and yes, grokking the state merging is painful, and I'm
> > sure
> > there are many bugs.
> > 
> > Are you familiar with the various dump formats for the graph? In
> > particular the .dot ones? FWIW I use xdot.py for viewing them:
> > https://github.com/jrfonseca/xdot.py
> > (and indeed am the maintainer of the Fedora package for it); it has
> > a
> > relatively quick and scalable UI for navigating graphs, but at some
> > point even it can't cope.
> > I started writing a 

[PATCH] docs: add interactive vs batch distinction to UX guidelines

2021-02-22 Thread David Malcolm via Gcc
On Sat, 2021-02-20 at 17:49 +0100, David Brown wrote:
> 
> 
> On 20/02/2021 16:46, David Malcolm wrote:
> > On Sat, 2021-02-20 at 15:25 +0100, David Brown wrote:
> 
> 
> > 
> > I think we need to think about both of these use-cases e.g. as we
> > implement our diagnostics, and that we should mention this
> > distinction
> > in our UX guidelines...
> > 
> > > Is it possible to distinguish these uses, and then have different
> > > default flags?  Perhaps something as simple as looking at the
> > > name
> > > used
> > > to call the compiler - "cc" or "gcc" ?
> > > 
> > 
> > ...but I'm wary of having an actual distinction between them in the
> > code; it seems like a way to complicate things and lead to "weird"
> > build failures.
> > 
> 
> Fair enough.

[...snip...]

How about the following addition to the User Experience Guidelines?

gcc/ChangeLog:
* doc/ux.texi: Add subsection contrasting interactive versus
batch usage of GCC.
---
 gcc/doc/ux.texi | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/doc/ux.texi b/gcc/doc/ux.texi
index fdba5da1598..28d5994d10f 100644
--- a/gcc/doc/ux.texi
+++ b/gcc/doc/ux.texi
@@ -86,6 +86,31 @@ information to allow the user to make an informed choice 
about whether
 they should care (and how to fix it), but a balance must be drawn against
 overloading the user with irrelevant data.
 
+@subsection Sometimes the user didn't write the code
+
+GCC is typically used in two different ways:
+
+@itemize @bullet
+@item
+Semi-interactive usage: GCC is used as a development tool when the user
+is writing code, as the ``compile'' part of the ``edit-compile-debug''
+cycle.  The user is actively hacking on the code themself (perhaps a
+project they wrote, or someone else's), where they just made a change
+to the code and want to see what happens, and to be warned about
+mistakes.
+
+@item
+Batch rebuilds: where the user is recompiling one or more existing
+packages, and GCC is a detail that's being invoked by various build
+scripts.  Examples include a user trying to bring up an operating system
+consisting of hundreds of packages on a new CPU architecture, where the
+packages were written by many different people, or simply rebuilding
+packages after a dependency changed, where the user is hoping
+``nothing breaks'', since they are unfamiliar with the code.
+@end itemize
+
+Keep both of these styles of usage in mind when implementing diagnostics.
+
 @subsection Precision of Wording
 
 Provide the user with details that allow them to identify what the
-- 
2.26.2



Re: Constraints and branching in -fanalyzer

2021-02-26 Thread David Malcolm via Gcc
On Fri, 2021-02-26 at 04:23 +, brian.sobulefsky wrote:
> Hi,
> 
> I have implemented the discussed change, bootstrapped, and run the
> testsuite. I
> would be submitting except to my disappointment I saw failures
> increase by 4. As
> it turns out, these "failures" are actually passes that had been
> marked "xfail"
> and "TRUE" "desired" in the testsuite. The items in question are in
> testsuite
> files gcc.dg/analyzer/operations.c and params.c. In particular
> operations.c
> is only partially fixed because, as I have described, I thus far have
> only added
> cases for PLUS and MINUS. As you can see in that test file, you have
> some tests
> involving multiplication and division. My question is, before
> bothering to
> submit would you like me to just add handlers for these? I guess it
> will save us
> a patch cycle.

Can you post what you have so far?

It's easier for me to understand a patch by looking at the patch,
rather than a description of a patch, if that makes sense.

Is the issue that doing a full bootstrap&test cycle is too slow?  If so
I'm fine with you posting preliminary patches for discussion if you're
upfront about the ones that haven't been through a full bootstrap&test
run.  Also, would it help if you had access to the GCC compiler farm? 
There are some very fast machines there.

(that said, I'm meant to be taking a day off today so I ought to sign
off for now)

Dave

> Also, your comment regarding overflows is well taken, but I think we
> should fix
> the overall problem first, then worry about the overflow corner case.
> 
> Brian
> 




Re: GSoC 2021 - Static analyzer project

2021-03-05 Thread David Malcolm via Gcc
On Fri, 2021-03-05 at 17:04 +0530, Ankur Saini via Gcc wrote:
> Hello,

Hi Ankur

> While looking for some project to contribute on for GSOC 2021, I came
> across project about extending static analyser pass, especially the
> part that involve adding C++ support to it.
> 
> I have already used -fanalyzer option ( which I initially came to
> know about via some blog post ) a couple of times to make debugging
> process of some of my C projects easier and faster ( especially
> thanks to the part where it also provides CWE code of the error along
> with the error message )  but always wanted a C++ version of it ever
> since ( as that is the language I use the most ), and finding it as a
> project idea for this years GSOC sounded a perfect opportunity for me
> to try and contribute something to this project.
> 
> I have already built the compiler from the source code and was able
> to run a testsuit for it as mentioned in “Before you apply” section
> of the “Summer Of Code” page of gcc (   
> https://gcc.gnu.org/wiki/SummerOfCode <   
> https://gcc.gnu.org/wiki/SummerOfCode>),
> currently I am in process of reading this (
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
>  <   
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> >) documentation to understand how things are going on under the hood
> and trying to make sense out of the source code of the analyzer
> itself with the help of it.

Sounds great.

> I have some questions before applying
> 
> - Am I on right path before applying for the project ? 

What you're doing sounds like the right approach.

> - Is there a way I can contribute some small bug fixes before
> applying for the real project itself 
> ( although I am scanning the bug tracker(   
> https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=SUSPENDED&bug_status=WAITING&bug_status=REOPENED&bug_status=VERIFIED&component=analyzer&product=gcc
>  <   
> https://gcc.gnu.org/bugzilla/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=SUSPENDED&bug_status=WAITING&bug_status=REOPENED&bug_status=VERIFIED&component=analyzer&product=gcc
> >) for any potential quick fix but any help in finding one would be a
> great ) ? 

I fear that at this point I've fixed all the easy bugs and it's only
the more difficult ones remaining :(

If you run the analyzer on your own code, and can trigger a false
positive or a false negative with the analyzer on it, and try to figure
out the issue, that could be a useful step (though it might turn out to
be a difficult one to fix, of course...)


There is a tracker bug for C++ support in the analyzer here:
  https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=97110
though obviously that would be actually doing the project itself.

To set expectations of what's reasonable to do in one summer - I don't
expect someone to be able to fully implement C++ support in one GSoC
project; for example, both of
  (a) implementing exception-handling and
  (b) implementing RTTI/vfuncs
are each probably big enough by themselves to take all summer.  So you
might want to pick one of those two to focus on (there are some notes
on each in the bugzilla comments).

> - Is there anything else I should be aware of before applying ?

I think if you've read the internals doc and the various organization
stuff on https://gcc.gnu.org/wiki/SummerOfCode page you're on the right
lines.

Hope this is helpful; good luck!

Dave



Re: [PATCH] docs: add interactive vs batch distinction to UX guidelines

2021-03-10 Thread David Malcolm via Gcc
On Mon, 2021-02-22 at 21:26 -0500, David Malcolm wrote:
> On Sat, 2021-02-20 at 17:49 +0100, David Brown wrote:
> > 
> > 
> > On 20/02/2021 16:46, David Malcolm wrote:
> > > On Sat, 2021-02-20 at 15:25 +0100, David Brown wrote:
> > 
> > 
> > > 
> > > I think we need to think about both of these use-cases e.g. as we
> > > implement our diagnostics, and that we should mention this
> > > distinction
> > > in our UX guidelines...
> > > 
> > > > Is it possible to distinguish these uses, and then have
> > > > different
> > > > default flags?  Perhaps something as simple as looking at the
> > > > name
> > > > used
> > > > to call the compiler - "cc" or "gcc" ?
> > > > 
> > > 
> > > ...but I'm wary of having an actual distinction between them in
> > > the
> > > code; it seems like a way to complicate things and lead to
> > > "weird"
> > > build failures.
> > > 
> > 
> > Fair enough.
> 
> [...snip...]
> 
> How about the following addition to the User Experience Guidelines?

I've gone ahead and pushed this to trunk (as
c4a36bb1e1be0b826e71f4723c9f66266aa86b6f), after checking it
bootstrapped.

Dave

> gcc/ChangeLog:
> * doc/ux.texi: Add subsection contrasting interactive versus
> batch usage of GCC.
> ---
>  gcc/doc/ux.texi | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/gcc/doc/ux.texi b/gcc/doc/ux.texi
> index fdba5da1598..28d5994d10f 100644
> --- a/gcc/doc/ux.texi
> +++ b/gcc/doc/ux.texi
> @@ -86,6 +86,31 @@ information to allow the user to make an informed
> choice about whether
>  they should care (and how to fix it), but a balance must be drawn
> against
>  overloading the user with irrelevant data.
>  
> +@subsection Sometimes the user didn't write the code
> +
> +GCC is typically used in two different ways:
> +
> +@itemize @bullet
> +@item
> +Semi-interactive usage: GCC is used as a development tool when the
> user
> +is writing code, as the ``compile'' part of the ``edit-compile-
> debug''
> +cycle.  The user is actively hacking on the code themself (perhaps a
> +project they wrote, or someone else's), where they just made a
> change
> +to the code and want to see what happens, and to be warned about
> +mistakes.
> +
> +@item
> +Batch rebuilds: where the user is recompiling one or more existing
> +packages, and GCC is a detail that's being invoked by various build
> +scripts.  Examples include a user trying to bring up an operating
> system
> +consisting of hundreds of packages on a new CPU architecture, where
> the
> +packages were written by many different people, or simply rebuilding
> +packages after a dependency changed, where the user is hoping
> +``nothing breaks'', since they are unfamiliar with the code.
> +@end itemize
> +
> +Keep both of these styles of usage in mind when implementing
> diagnostics.
> +
>  @subsection Precision of Wording
>  
>  Provide the user with details that allow them to identify what the




Re: GSoC project idea

2021-03-12 Thread David Malcolm via Gcc
On Thu, 2021-03-11 at 12:59 +0530, srishty bedi via Gcc wrote:
> Greetings,

Hi Srishty

Various remarks inline below...

> First of all Congratulations to the gcc community on being selected
> for
> GSOC 2021.
> 
> My name is Srishty Bedi, I am a sophomore pursuing btech CSE in
> India. .I
> am interested in web development and have worked with
> JS,HTML,CSS,bootstrap
> for front end and php,pug for backend.

It sounds like your skillset is more on web development than on
compilers.  We don't do much HTML/JS within GCC [1], so I'd recommend
looking at another project that's more aligned to your interests,
especially given that GSoC is much shorter than usual this year - we
are looking for people who are already up-to-speed on C++ and on
compiler internals, and there simply isn't time if you don't already
have some skills in those areas.

> The ideas for gsoc 21 :
> 
> 
> 
> 1) I was thinking that whenever we run a code on gcc compiler its
> beautify
> feature doesn't work well it shifts the whole code to the left
> instead so
> thought of creating a good beautify feature as we have in vs code so
> that
> the code is easily understandable .

I'm not sure what you mean by the "beautify feature" shifting "the
whole code to the left".

Are you referring to the way GCC quotes the user's source code when GCC
emits warnings and errors?  I maintain that part of the code, so I'm
interested in hearing of ideas for improvement, but I don't see it
being a GSoC project this year.


> 2) And we can also make the gcc compiler accessible to phones and
> ipads so
> that it may be compatible for touch devices also as it will be very
> beneficial.

It's possible to use GCC from such devices using the excellent
godbolt.org website.


> 
> I am very fascinated by the world of open source and I am looking
> forward
> to contributing to this community .
> 
> Looking forward to your help and guidance.
> 
> Hoping for a positive response from you.
> 
> Here is my github link: https://github.com/srishty-07
> 
> 
> Regards,
> 
> Srishty

Hope this is helpful; good luck finding a suitable project
David

[1] FWIW I've been working on adding HTML/JS output to GCC:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558603.html
but the main issues there are on what a web developer would call the
"backend", in terms of populating the HTML with meaningful content, as
opposed to what a web developer would call the "frontend" (templates,
CSS and JS).  i.e. there's some gnarly C++ code that needs overhauling




Re: GSOC-2021

2021-03-22 Thread David Malcolm via Gcc
On Sun, 2021-03-21 at 00:31 +0530, Namitha S via Gcc wrote:
> Hi,
> I am Namitha S, an undergrad from Amrita University. This mail is
> regarding
> GSOC-2021, I wanted to know more about the project "Extend the static
> analysis pass". I've gone through the wiki and finished the tasks
> listed
> out in the before you apply section. I've already read the mail
> replies
> that were written earlier for a query related to the same project
> Are there any other things you'd recommend looking at?. Looking back
> to
> hear back.

Hi

I'm the author of the static analysis pass and would be the mentor for
any student(s) accepted for projects relating to it for GSoC.

There are various subprojects listed on that wiki page, some of which
other students have already expressed interest in (kernel support and
C++, although there's at least 2 projects' worth of material in the
latter).  Do you have a preference for any of them?  No-one has yet
expressed at interest in SARIF support, so that might make for a good
self-contained GSoC project to propose.

Hope this is helpful
Dave



Re: GSoC

2021-03-22 Thread David Malcolm via Gcc
Hi Isitha (and Philip!)

If I'm reading Isitha's email correctly, it talks about static
analysis, whereas Philip's talks about GCC Rust, so some wires got
crossed somewhere.

I'm the author of the GCC static analysis pass.  I should confess that
I still feel like I'm learning static analysis myself - I too own a
copy of the Nielson, Nielson & Hankin book you mention, but have only
skimmed it.  FWIW, I find the very early papers by Patrick and Radhia
Cousot from the beginning of the field much easier to read, as they
take more time spelling out the meaning of the mathematics.  I should
also confess that the analysis pass takes some liberties compared to a
formal approach, grabbing ideas from here and there, plugging them into
30+-year-old codebase in a way that I hope is a reasonable trade-off
between speed, (lack of) soundness, (lack of) completeness, and
readability of output by end-user.

The static analysis pass is meant to be reasonably modular, so the
various suggested projects listed on the wiki page ought to be
implementable without knowing everything all at once.

However, as Philip says, GSoC imposes a particular timeline, and I
don't know to what extent might be a dealbreaker.

Hope this is helpful
Dave

On Fri, 2021-03-19 at 13:24 +, Philip Herron wrote:
> Hi Isitha,
> 
> Thanks for your interest in GCC Rust, it's an exciting project that
> is
> early on in development, so there is plenty of scoping for making
> your mark
> on the compiler. In regards to your proposal feel free to join our
> Zulip
> server https://gcc-rust.zulipchat.com/ and it can be discussed with
> the
> community.
> 
> As for the Google Summer of Code timeline, I would have to defer to
> their
> rules. Maybe others here know better in this mailing list but as far
> as I
> know, to complete the google summer of code there are dated
> milestones of
> review so this might break the rules if you have exams and are unable
> to
> allocate the time towards it.
> 
> Hope this helps, I hope it works out for you.
> 
> Thanks
> 
> --Phil
> 
> 
> 
> On Fri, 19 Mar 2021 at 08:04, Isitha Subasinghe via Gcc
> 
> wrote:
> 
> > To whom it may concern,
> > 
> > I am a student interested in participating in GSoC this year. After
> > having
> > a look at some of the available PL projects, gccrs caught my
> > attention. I
> > love Rust and have an interest in exploring more about type theory
> > and
> > automatic garbage collection.
> > 
> > My background is that I am a Masters's student at the University of
> > Melbourne in Australia, I have undertaken a graduate-level compiler
> > class
> > where we implemented a stack-based compiler in Haskell.
> > 
> > I am quite interested in working on the static analysis project but
> > wanted
> > feedback to iron out and address my proposal before I submit it.
> > 
> > I am quite confident in my C/C++ skills but somewhat unsure about
> > the level
> > of knowledge of static analysis that I would need. Unfortunately, I
> > am yet
> > to take any classes in this particular subfield but I am absolutely
> > happy
> > to learn on my own time and have purchased the book Principles of
> > Program
> > Analysis to assist with this matter.
> > 
> > Also, I did want to notify you that I would be available for less
> > than the
> > entire coding duration of GSoC due to university commitments.
> > Unfortunately, my exams overlap with GSoC, and it is hard to
> > compromise on
> > University studies since I am hoping to do a PhD in PL after the
> > completion
> > of my master's. I would be absolutely happy to make up this time at
> > the end
> > of the year where I have a 3-month break.
> > 
> > Best Regards,
> > Isitha
> > 
> 




[committed] MAINTAINERS: add myself as static analyzer maintainer

2021-03-23 Thread David Malcolm via Gcc
On Tue, 2021-03-23 at 08:44 -0600, Jeff Law wrote:
> 
> I am pleased to announce that the GCC Steering Committee has
> appointed 
> David Malcolm as maintainer of the GCC static analyzer.
> 
> 
> David, please update your listing in the MAINTAINERS file.

Thanks.

I've pushed the following to trunk (as 19599551045412a9badb33543f8bd26db039f5f1)

ChangeLog:
* MAINTAINERS: Add myself as static analyzer maintainer.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 75954753161..1722f0aa8fc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -250,6 +250,7 @@ gdbhooks.py David Malcolm   

 SLSR   Bill Schmidt
 jitDavid Malcolm   
 gen* on machine desc   Richard Sandiford   
+static analyzerDavid Malcolm   
 
 Note that individuals who maintain parts of the compiler need approval to
 check in changes outside of the parts of the compiler they maintain.
-- 
2.26.2



Re: [GSoC-2021] Interested in project `Extend the static analysis pass`

2021-03-25 Thread David Malcolm via Gcc
On Thu, 2021-03-25 at 14:52 +0530, Saloni Garg via Gcc wrote:
> Hi all,
> I am an undergraduate student in AMU, Aligarh. I am interested in the
> project* `Extend the static analysis pass`. *I have followed this(
> https://gcc.gnu.org/pipermail/gcc/2021-March/234941.html) and been
> able to
> successfully build and successfully ran and pass the test suite for C
> and
> C++.
> 
> I found this sub-project `C++ support (new/delete checking,
> exceptions,
> etc)` interesting and may be the conservative code for this can be
> made
> along the lines of malloc/free implementation in C. I found here(
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94355) that some part of
> it
> has already been implemented . I would like to expand it further and
> learn
> about it, maybe start with writing some test cases, please let me
> know.
> 
> Further, I am inclined on this(
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97111). Let me know if
> it is
> still available.
> 
> Looking forward to hearing from you guys.
> Thanks,
> Saloni Garg

Hi!

I'm the author/maintainer of the static analysis pass, and would be the
mentor for any GSoC project(s) involving it.

I've already implemented most of the new/delete checking in GCC 11; the
big missing component there is exception-handling.

Implementing exception-handling in the analyzer could make a good GSoC
project: it's non-trivial, but hopefully doable in one summer.  I see
you've already seen bug 97111, and there are some links in that bug to
resources.  Given that the analyzer runs on the gimple-ssa
representation, by the time it sees the code, much of the exception-
handling has already been translated into calls to various __cxa_-
prefixed functions in the C++ runtime, so part of the work would
involve "teaching" the analyzer about those functions.  One way to make
a start on this would be to create a collection of trivial C++ examples
that use exceptions, and then look at analyzer dumps to see what IR is
being "seen" by the analyzer for the various constructs.   (I actually
started this a long time ago and have a very crude barely-working
prototype, but it was just the start, and I've forgotten almost all of
it...)

Hope this is helpful
Dave




Re: Remove RMS from the GCC Steering Committee

2021-03-26 Thread David Malcolm via Gcc
On Fri, 2021-03-26 at 20:51 +, Jonathan Wakely via Gcc wrote:
> On Fri, 26 Mar 2021, 20:03 Nathan Sidwell,  wrote:
> 
> > 
> > Dear members of the GCC Steering Committee (SC),  I ask you to
> > remove
> > Richard
> > Stallman (RMS) from the SC, or, should you chose not to do so, make
> > a
> > clear
> > statement as to why he remains.
> > 
> 
> I second Nathan's request, and agree with everything he said.

To any observers seeing this email unfamiliar with the project, I
wanted to note that Jonathan is currently one of the most prolific
contributors to GCC, showing up as #2 on this list for commit count to
GCC over the last year, with 590 commits:
  
https://www.openhub.net/p/gcc/contributors?query=&time_span=&sort=twelve_month_commits
[1]

I too second Nathan's request, and agree with what he said.

(Nathan shows up as #7 on that list, and Marek who has also replied
whilst I've been writing this shows up as #9; I'm #11 on that list).

> It's important for GCC today, and for potential future contributors.

Agreed.

I own an autographed copy of RMS's book from when I used to look up to
him, but, frankly, I think the cause of Free Software would have been
helped greatly by him taking retirement from the FSF/GNU at least a
decade ago.

I should note that I'm writing this in a personal capacity, not on
behalf of Red Hat, though FWIW Red Hat has said the following on the
matter of RMS's return to the FSF board:
https://www.redhat.com/en/blog/red-hat-statement-about-richard-stallmans-return-free-software-foundation-board


Now to get back to fixing bugs...
Dave

[1] actually the 12 month period ending 2 months ago, since that was
the last time the project was rescanned by openhub



Re: [GSoC-2021] Interested in project `Extend the static analysis pass`

2021-03-28 Thread David Malcolm via Gcc
On Sun, 2021-03-28 at 18:06 +0530, Saloni Garg wrote:
> Hi, I have tried the following examples with the fanalyzer option in
> g++.
> 
> 1 (a)
> void myFunction()
> {
>     char *p =new char;
> }
> int main()
> {
>    func();
>    return 0;
> }

BTW, are you familiar with Compiler Explorer (godbolt.org)?  It's very
handy for testing small snippets of code on different compilers and
different compiler versions.  Though I don't know how long the URLs are
good for (in terms of how long code is cached for)

Fixing up the name of the called function to "func":
  https://godbolt.org/z/TnM6n4xGc
I get the leak report, as per RFE 94355.  This warning looks correct,
in that p does indeed leak.

I should mention that the analyzer has some special-casing for "main",
in that the user might not care about one-time leaks that occur within
"main", or something only called directly by it; this doesn't seem to
be the case here.  If I remove the implementation to main, the analyzer
still correctly complains about the leak:
  https://godbolt.org/z/zhK4vW6G8

: In function 'void func()':
:4:1: warning: leak of 'p' [CWE-401] [-Wanalyzer-malloc-leak]
4 | }
  | ^
  'void func()': events 1-2
|
|3 | char *p =new char;
|  |  ^~~~
|  |  |
|  |  (1) allocated here
|4 | }
|  | ~ 
|  | |
|  | (2) 'p' leaks here; was allocated at (1)
|


> 1(b)
> void myFunction()
> {
>     try {
>  char *p = new char;
>  throw p;
>     }
>     catch(...) {
>     }
> }
> int main()
> {
>    myFunction();
>    return 0;
> }
> In 1(a), there is no exception handling. When I ran `cc1plus`, a
> memory
> leak was reported as shown in bug #94355.
> In 1(b), there is a use of exception handling. When I ran cc1plus`,
> no
> memory leaks were detected. I believe there should be one. Can you
> please
> confirm from your side as well?

I too am seeing no diagnostics on 1(b).

> As you said all the calls to try, catch and
> throw got converted to _cxa prefixed functions. 

-fdump-ipa-analyzer=stderr shows the _cxa-prefixed functions:
  https://godbolt.org/z/YMa9dE6aM

> I am trying to find the
> places where the corresponding checks can be placed for the analysis
> of
> exception handling in gimple IR.

Have a look at exploded_node::on_stmt in engine.cc; in particular, see
the GIMPLE_CALL case in the switch statement.  Most of the the
analyzer's "knowledge" of the behaviors of specific functions is here,
or called from here.

The simpler cases are handled in the call to
  m_region_model->on_call_pre
for functions which merely update state, which are implemented in
region-model-impl-calls.cc

Cases involving state machines (e.g. allocation) are handled in the:
  sm.on_stmt
call torwards the bottom of the function.

But exception-handling is a special case, in that it affects control
flow.  The closest thing to compare it to currently within the analyzer
is setjmp/longjmp, so it's worth stepping through how that is handled.
In particular, the real implementation of longjmp involves directly
updating the program counter, registers and stack, potentially popping
multiple stack frames.  This is similar to what throwing an exception
does.

So I'd recommend looking at the analyzer's implementation of
setjmp/longjmp, the custom classes that I added to handle them, and
stepping through how exploded_node::on_stmt handles setjmp and longjmp
calls, and what the resulting exploded_graph looks like (-fdump-
analyzer-exploded-graph and -fdump-analyzer-supergraph), in that
special-cased edges have to be created that weren't in the original
CFGs or callgraph (for the interprocedural case).

I think an implementation of exception-handling would look somewhat
similar.

> Please, let me know your thoughts on this.

Looks like you're making a great start.

Hope this is helpful
Dave


> On Fri, Mar 26, 2021 at 12:48 AM David Malcolm 
> wrote:
> 
> > On Thu, 2021-03-25 at 14:52 +0530, Saloni Garg via Gcc wrote:
> > > Hi all,
> > > I am an undergraduate student in AMU, Aligarh. I am interested in
> > > the
> > > project* `Extend the static analysis pass`. *I have followed
> > > this(
> > > https://gcc.gnu.org/pipermail/gcc/2021-March/234941.html) and
> > > been
> > > able to
> > > successfully build and successfully ran and pass the test suite
> > > for C
> > > and
> > > C++.
> > > 
> > > I found this sub-project `C++ support (new/delete checking,
> > > exceptions,
> > > etc)` interesting and may be the conservative code for this can
> > > be
> > > made
> > > along the lines of malloc/free implementation in C. I found here(
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94355) that some
> > > part of
> > > it
> > > has already been implemented . I would like to expand it further
> > > and
> > > learn
> > > about it, maybe start with writing some test cases, please let me
> > > know.
> > > 
> > > Further, I am inclined on th

Re: [GSoC-2021] Interested in project `Extend the static analysis pass`

2021-03-30 Thread David Malcolm via Gcc
On Tue, 2021-03-30 at 16:06 +0530, Saloni Garg wrote:
> On Sun, Mar 28, 2021 at 8:03 PM David Malcolm 
> wrote:
> 
> > On Sun, 2021-03-28 at 18:06 +0530, Saloni Garg wrote:
> > > Hi, I have tried the following examples with the fanalyzer option
> > > in
> > > g++.
> > > 
> > > 1 (a)
> > > void myFunction()
> > > {
> > > char *p =new char;
> > > }
> > > int main()
> > > {
> > >func();
> > >return 0;
> > > }
> > 
> > BTW, are you familiar with Compiler Explorer (godbolt.org)?  It's
> > very
> > handy for testing small snippets of code on different compilers and
> > different compiler versions.  Though I don't know how long the URLs
> > are
> > good for (in terms of how long code is cached for)
> > 
> > Fixing up the name of the called function to "func":
> >   https://godbolt.org/z/TnM6n4xGc
> > I get the leak report, as per RFE 94355.  This warning looks
> > correct,
> > in that p does indeed leak.
> > 
> > Hi, thanks for the effort, sorry for the typo. I now know about the
> godbolt.org and it is certainly useful.
> 
> > I should mention that the analyzer has some special-casing for
> > "main",
> > in that the user might not care about one-time leaks that occur
> > within
> > "main", or something only called directly by it; this doesn't seem
> > to
> > be the case here.  If I remove the implementation to main, the
> > analyzer
> > still correctly complains about the leak:
> >   https://godbolt.org/z/zhK4vW6G8
> > 
> > That's something new. I also didn't know that. I believe we can
> > shift our
> minimal example to just func() and remove main().

Yes - simpler is better with such examples.

(Occasionally it's helpful to have "main" so that the resulting code
can be executed - especially under valgrind, as a check that something
really is leaking - but a simpler reproducer is usually best when
debugging)

[...snip...]

> > I think an implementation of exception-handling would look somewhat
> > similar.
> > 
> > Thanks, for all the references to the code. I am new to GCC, so
> > apologies
> if I am a bit slow in understanding. I am trying to run and go
> through all
> the references that you gave me.

Sorry if I'm overwhelming you with too much at once...

...and here's yet more information!

I wrote this guide to getting started with hacking on GCC, aimed at
newcomers to the project:
  https://dmalcolm.fedorapeople.org/gcc/newbies-guide/

and in particular you may find the guide to debugging GCC useful:
  https://dmalcolm.fedorapeople.org/gcc/newbies-guide/debugging.html

FWIW I like to use
  -fanalyzer-dump-stderr
when stepping through the analyzer in gdb, so that I can put
breakpoints on what I'm interested in, but also have a log of the
activity that happened between the breakpoints.


> > > Please, let me know your thoughts on this.
> > 
> > Looks like you're making a great start.
> > 
> Thanks for the feedback.  In parallel, can I start working on the
> Gsoc
> proposal as well?

Please do work on the formal proposal - without it we can't accept you
as a GSoC student.  The window for submitting proposals opened
yesterday, and I believe it closes in a couple of weeks, and you need
to do that, so any experimentation you do now should really just be in
support of writing a good proposal.  It would be a shame to not have a
good prospective candidate because they didn't allow enough time to do
the proper GSoC paperwork before the deadline.

> I hope, we can get suggestions from the gcc community as
> well once the things are written properly in a document.

Indeed

Hope this is constructive
Dave

[...snip...]



Re: GSoC 2021 - Static analyzer project

2021-03-30 Thread David Malcolm via Gcc
On Tue, 2021-03-30 at 16:36 +0530, Ankur Saini wrote:
> hello sir 
> 
> in my quest of finding a bug ( which ended up being a feature ) along
> with it’s source in the analyzer, I tested the code on these 2 code
> snippets and here’s how I went towards it 
> 
> (1)
> int main()
> {
>     int *ptr = (int *)malloc(sizeof(int));
>     return 0;
> }
> 
> link to running example (https://godbolt.org/z/1jGW1qYez <
> https://godbolt.org/z/1jGW1qYez>)
> 
> (2)
> int definaltly_main()
> {
>     int *ptr = (int *)malloc(sizeof(int));
>     return 0;
> }
> 
> link to running example (https://godbolt.org/z/bzjMYex4M <
> https://godbolt.org/z/bzjMYex4M>)
> 
> 
> where on second snipper analyzer is warning us about the leak as it
> should be, but in the first one it isn’t. 
> 
> and as the gimple representation of both looks exactly the same apart
> from function name, which made me think that either intentionally or
> unintentionally, analyzer handles case of main() differently than any
> other function.

Correct - the analyzer special-cases "main".

Specifically, in impl_region_model_context::on_state_leak, there's this
code:
  
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/analyzer/engine.cc;h=d7866b5598b4fcb791ec6ff511dde9b7615e7794;hb=HEAD#l624

 624   /* Don't complain about leaks when returning from "main".  */
 625   if (m_enode_for_diag->get_supernode ()
 626   && m_enode_for_diag->get_supernode ()->return_p ())
 627 {
 628   tree fndecl = m_enode_for_diag->get_function ()->decl;
 629   if (id_equal (DECL_NAME (fndecl), "main"))
 630 {
 631   if (logger)
 632 logger->log ("not reporting leak from main");
 633   return;
 634 }
 635 }

on the grounds that for the resources that the analyzer currently
tracks, it doesn't matter if they aren't cleaned up as the process
exits, so we don't bother the user with a report about them.

> so while looking at it’s exploded graphs I found out that the only 2
> differences in them 
> 
> 1. There is one less exploded node(after node E-8) created in first
> one ( I earlier thought state merging or state pruning is taking
> place here but it isn’t because the results are not affected even
> after disabling those using  `-fno-analyzer-state-purge` and `-fno-
> analyzer-state-merge` options )

Well spotted.

I see that too.

For (1) EN 8, has 2 stmts, the label and the return, but for (2), it
splits them with EN: 8 having just the label, and the return split into
EN 9.

I was surprised by this and did some digging in gdb, for both (1) and
(2).  The reason seems to be rather arbitrary; specifically in (2),
stmt_requires_new_enode_p is returning true due to hitting this case:

2894  /* If we had a PREV_STMT with an unknown location, and this stmt
2895 has a known location, then if a state change happens here, it
2896 could be consolidated into PREV_STMT, giving us an event with
2897 no location.  Ensure that STMT gets its own exploded_node to
2898 avoid this.  */
2899  if (get_pure_location (prev_stmt->location) == UNKNOWN_LOCATION
2900  && get_pure_location (stmt->location) != UNKNOWN_LOCATION)
2901return true;

and presumably it isn't hitting this for (1).

Specifically, where "stmt" is the return stmt and "prev_stmt" is the
label statement.

for (1):

(gdb) p /x stmt->location
$1 = 0x0
(gdb) p /x prev_stmt->location
$2 = 0x0

whereas for (2):

(gdb) p /x stmt->location
$5 = 0x80f2
(gdb) p /x prev_stmt->location
$6 = 0x0

So with (1) the return stmt has UNKNOWN_LOCATION:

(gdb) call inform (stmt->location, "return stmt in (1)")
In function ‘main’:
cc1: note: return stmt in (1)

whereas for (2) the return stmt has a source location:

(gdb) call inform (stmt->location, "return stmt in (2)")
In function ‘definaltly_main’:
/tmp/2.c:6:12: note: return stmt in (2)
6 | return 0;
  |^

This slight difference in the recorded location of the return stmt for
the "main" vs non-"main" case affects the splitting of the nodes.

Athough a curiosity, I don't think this is significant.  (In theory one
could use a hardware watchpoint on stmt->location to track this
discrepancy down further, but I don't think it's important enough to
bother).


> 2. no diagnosis for malloc leak happening at the end of first one
> even though there exist a pointer in unchecked state at the end (
> according to the malloc state machine )

Correct - the analyzer specialcases "main" and ignores it, as noted
above.


> In quest to find the cause I started navigating through the source
> code of the analyser starting off with the run_checkers() function in
> engine.cc which looks like the entry point of the analyser ( found
> via the commit history of the analyzer ). But finally it ended at
> `impl_region_model_context::on_state_leak()` function where I found
> out that analyzer is intentionally skipping the leak report when it
> is in main. 

Sounds like you successfully tra

Re: Interested In extend the static analysis pass

2021-03-31 Thread David Malcolm via Gcc
On Wed, 2021-03-31 at 16:59 +0530, Gagandeep Bhatia via Gcc wrote:
> Hey Team GNU Compiler, I'm Gagandeep Bhatia, currently pursuing the
> 2nd year at Christ University, Bangalore, India. You can reach me at
> gagandeepbhatia2...@gmail.com 
> or +919466935025.
> I went through your upcoming projects on Google Summer of Code, your
> idea on  extend the static analysis pass is really impressive. 

Thank you.

> Right now the code is in GCC’s master branch for GCC 10 and can be
> tried out on Compiler Explorer, aka godbolt.org. It works well for
> small and medium-sized examples, but there are bugs that mean it’s
> not ready for production use. I’m working hard on fixing things in
> the hope that the feature will be meaningfully usable for C code.

You seem to have simply copied and pasted the above three sentences
from my March 2020 blog post on the analyzer.

As Jonathan notes, the code has substantially changed since the GCC 10
release.

>  I truly want to become part of this project for better mutual
> achievement, in the past, I have done a few projects, which will help
> me to contribute my best in this project.
> It will be really kind if you can share some more details about your
> project, so I learn more about your project.
> with regards.

There are some more concrete ideas about the analyzer on the wiki page
here:
  https://gcc.gnu.org/wiki/SummerOfCode
and there is a link there to the wiki page about the analyzer which has
lots more information.

You may want to read the archives of this mailing list, where some of
those ideas are discussed in more detail with other prospective GSoC
students.

Hope this is constructive
Dave




Re: Remove RMS from the GCC Steering Committee

2021-03-31 Thread David Malcolm via Gcc
On Wed, 2021-03-31 at 16:18 +0200, Christopher Dimech via Gcc wrote:

[...snip...]

> As for the "safe spaces" phase, this is about eliminating anything
> and
> everything that could emotionally troubling students. This assumes a
> high
> degree of fragility among western students.  I work as a journalist
> and
> have had colleagues blown to smithereens - foot there, bits of brain
> there.
> I wonder how many of you bitches, have ever been shot or had a bomb
> blown
> up your ass.   

I've been attempting to decide if you're merely trolling us, or if you
genuinely believe the stuff you've been posting to this list.

With your latest missive I'm leaning to the former interpretation, but
if the latter, may I humbly suggest that referring to us as "bitches"
might not be the best way to win people over, and that it's not normal
to have to work in a literal war zone, and that most reasonable people
do not want to work in a figurative war zone.

[...snip...]

Hope this is constructive
Dave

(my opinions only, not my employer's)



Re: [GSoC-2021] Interested in project `Extend the static analysis pass`

2021-03-31 Thread David Malcolm via Gcc
On Wed, 2021-03-31 at 21:41 +0530, Saloni Garg wrote:
> On Tue, Mar 30, 2021 at 6:42 PM David Malcolm 
> wrote:
> 
> > On Tue, 2021-03-30 at 16:06 +0530, Saloni Garg wrote:
> > > On Sun, Mar 28, 2021 at 8:03 PM David Malcolm <
> > > dmalc...@redhat.com>
> > > wrote:

[...snip...]

> > > 
> No, it's actually fun learning all this. Thank you for sharing all
> the
> references. Although, I was already using gdb to travel inside the
> code.

Great!

> > 
> > > > > Please, let me know your thoughts on this.
> > > > 
> > > > Looks like you're making a great start.
> > > > 
> > > Thanks for the feedback.  In parallel, can I start working on the
> > > Gsoc
> > > proposal as well?
> > 
> > Please do work on the formal proposal - without it we can't accept
> > you
> > as a GSoC student.  The window for submitting proposals opened
> > yesterday, and I believe it closes in a couple of weeks, and you
> > need
> > to do that, so any experimentation you do now should really just be
> > in
> > support of writing a good proposal.  It would be a shame to not
> > have a
> > good prospective candidate because they didn't allow enough time to
> > do
> > the proper GSoC paperwork before the deadline.
> > 
> Thanks for understanding. Here is an initial draft (
>   
> https://docs.google.com/document/d/1inkkU5B55s_FOWRzUuf38s7XEet65kc0dW3yFn0yh1M/edit?usp=sharing
> )
> of my GSoC proposal. I am yet to fill in the missing blocks.
> Please, let me know if you have any comments on the document itself.

Caveat: I'm not familiar with the expected format of such documents.

Looks like a good first draft.

Some notes:
- maybe update the title to be more specific (i.e. that it's about
extending the pass to support C++ exception-handling)
- my email address is misspelled (missing the leading "d")
- in Example 2, maybe spell out why it's a leak - when does the
allocated buffer stop being referenceable?
- you have a simple example of a false negative; is it possible to give
a simple example of a false positive?  (I think "new" is meant to raise
an exception if it fails, so a diagnostics about a NULL-deref on
unchecked new might be a false positive.  I'm not sure)
- maybe specify that this is exception-handling specifically for C++
code (GCC supports many languages)
- "sample example programs": for "sample" did you mean to write
"simple" here?
- as well as understanding code, you'll need to understand data,
specifically getting a feel for the kinds of control flow graphs that
the analyzer is receiving as inputs i.e. what the analyzer "sees" when
the user inputs various C++ language constructs; what interprocedural
vs intraprocedural raise/try/catch situations look like, etc.

Hope this makes sense and is helpful
Dave



Re: Protest against removal of RMS from GCC Steering Committee

2021-04-01 Thread David Malcolm via Gcc
On Thu, 2021-04-01 at 17:23 +0200, Andrea G. Monaco wrote:
> 
> I strongly disagree with the removal of Dr. Stallman from the
> Steering
> Committee.

RMS was not removed from the GCC Steering Committee; his name was
removed from the *web page* of the steering committee.

Based on the discussion here it appears to me that he never was a
member of the steering committee, and the listing on that web page was
in error (and thus the recent thread on this was misnamed).

Rather, one of the roles of the steering committee seems to be to
interact with RMS and the FSF on behalf of the project so that the rest
of us can get on with maintaining our Free Software compiler.  Thanks
Steering Committee members!  (I liked David Edelhsohn's analogy earlier
that the SC is about removing roadblocks so that the developers can get
on with things).

> Not only RMS wrote the GCC initially, but I think he is the best
> person
> by far who can guarantee the values of free software, with unmatched
> integrity and lucidity.
> 
> That's especially important in the SC, given the presence of powerful
> and evil corporations like Google and IBM.
> 
> I suspect that many people agree with me, but perhaps they are scared
> into silence by this intimidating and hostile mob.

The only intimidation I've seen on this list has been from one of RMS's
supporters.

Hope this is constructive
Dave

(my opinions, and not those of my employer, which though it may have
made mistakes over the years is not currently overtly evil, last time I
checked)




Re: GSoC 2021 - Static analyzer project

2021-04-06 Thread David Malcolm via Gcc
On Tue, 2021-04-06 at 17:56 +0530, Ankur Saini wrote:

Hi Ankur.

Various replies inline below throughout.

> > On 30-Mar-2021, at 7:27 PM, David Malcolm 
> > wrote:
> 
> > > This gave rise to some questions
> > > 
> > > 1. why does the analyzer make exceptions with the main() function
> > > ?
> > 
> > The user's attention is important - we don't want to spam the user
> > with
> > unnecessary reports if we can help it.
> 
> make sense. 
> 
> ——
> 
> After fiddling around with a lot of C codes, I switched to C++
> programs  in-order to find how exactly the analyzer doesn’t
> understand exception handling and more interestingly calls to virtual
> functions ( which I am thinking to work on this summer ). 

Sounds like a good focus.

> It was comparatively harder to find such an example where it would
> fail as looks like gcc do amazingly nice job at devirtualising the
> function calls ( even with -O0 option ) but finally after a lot of
> attempts and reading online about devirtualisation, I found this
> particular example where the analyzer generates a false positive
> 
> #include 
> 
> struct A
> {
>     virtual int foo (void) 
>     {
>     return 42;
>     }
> };
> 
> struct B: public A
> {
>     int *ptr;
>     void alloc ()
>     {
>     ptr = (int*)malloc(sizeof(int));
>     }
> int foo (void) 
>     { 
>     free(ptr);
>     return 0;
>     }
> };
> 
> int test()
> {
>     struct B b, *bptr=&b;
>     b.alloc();
>     bptr->foo();
>     return bptr->foo();
> }
> 
> int main()
> {
>     test();
> }
> 
> working link of the above code (https://godbolt.org/z/n17WK4MxG < 
> https://godbolt.org/z/n17WK4MxG>)
> 
> here as the analyzer doesn’t understand the call to virtual function,
> wasn’t able to locate a double free in the program which was found at
> runtime.

Good work.

> so I went through it’s exploded graph to see how exactly is this
> being processed. And from the looks of things the anayzer doesn’t
> understood the function call which according to me was the following
> part in gimple representation :
> 
> 1 = bptr_8->D.3795._vptr.A;
>  _2 = *_1;
> OBJ_TYPE_REF(_2;(struct B)bptr_8->0) (bptr_8)
> 
> after scanning the source-code a bit i found out that such calls were
> being processed by "region_model::handle_unrecognized_call()” where
> it just keeps track of reachable states from that node.

Again, good detective work.

Right - the analyzer "sees" the jump through a function pointer, and it
doesn't yet have any special knowledge about what that function pointer
might be.

Given that we know we have a B, we ought to be able to assume that B::B
initializes the vtable of the instance, and make assumptions about what
the values in that vtable are.  The analyzer doesn't have any of this
special-case knowledge yet - hence bug 97114.

> ——
> 
> Questions 
> 
> 1. The link to the bug tracker for vfunc() [   
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97114 <  
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97114> ] says that for
> vfuncs() to be understood by anayzer, it ought to be able to
> devirtualize calls, but is it possible to devirtualise all the calls
> ? what if it is random or depends on user input ?

It's not possible for the general case.  Consider that there could be
other subclasses of A that we don't about in this translation unit.

But for this case, -fdump-ipa-analyzer=stderr shows this gimple at the
start of "test":

  B::B (&b);
  bptr_8 = &b;
  B::alloc (&b);
  _1 = bptr_8->D.3290._vptr.A;
  _2 = *_1;
  OBJ_TYPE_REF(_2;(struct B)bptr_8->0) (bptr_8);

As noted above, I think that we can add enough logic to the analyzer to
so that it "knows" that B::B should stores a vtable ptr, and then when
that vtable is accessed, it should know what functions are being
pointed to.  I think it would mean adding a new region subclass in the
analyzer, where an instance represents the vtable for a given class in
the user's code.

For cases where we have a "base *" and don't know which subclass the
instance is, we could potentially have the analyzer speculate about the
subclasses it knows about, adding exploded edges to speculated calls
for the various subclasses.  I don't yet know if this is a good idea,
but it seems worth experimenting with.

> 2. Even though analyzer didn’t understood calls to virtual function,
> it didn’t gave warning about a memory leak either which according to
> it, should exist if the functions were never called to deallocate the
> pointer ( after exploded node 152 and 118, the state of malloc
> changes automatically ) ?

If the analyzer "sees" a call to an unknown function (either through a
unknown function pointer, or to a function it doesn't have the body
of), it acts conservatively by resetting all of the state-machine state
for the values that are reachable by the function call, to avoid false
leak reports.  Hence it resets the state of B's ptr from the
"unchecked" allocation state back to the "start" state.

> sorry if I am asking a lot of quest

Re: GCC association with the FSF

2021-04-07 Thread David Malcolm via Gcc
On Wed, 2021-04-07 at 00:22 +0200, Mark Wielaard wrote:
> Hi,
> 
> Lets change the subject now that this is about GCC and the FSF.
> 
> On Wed, Mar 31, 2021 at 01:46:29PM +0100, Jonathan Wakely via Gcc
> wrote:
> > Probably unintentionally, but he has allowed the GNU Project to
> > become
> > a nasty cult of personality. The FSF seems to be imploding (with
> > mass
> > resignations in the past week). I don't think GCC benefits from
> > being
> > associated with either of them.
> 
> I admit it isn't looking very good and their last announcement is
> certainly odd: https://status.fsf.org/notice/3833062
> 
> But apparently the board is still meeting this week to discuss and
> might provide a better statement about the way out of this. So lets
> give them a couple more days before writing them off completely.
> 
> > Is there any incident where FSF being the copyright holder for GCC
> > has
> > made a difference?
> 
> Yes, at least in my experience it has been helpful that the FSF held
> copyright of code that had been assigned by various individuals and
> companies. It allowed the merger of GNU Classpath and libgcj for
> example. There have been various intances where it was helpful that
> the FSF could unilatrally adjust the license terms especially when
> the
> original contributor couldn't be found or didn't exist (as company)
> anymore.

This benefit arises from having a single entity own the copyright in
the code.  It doesn't necessarily have to be the FSF to gain this
benefit; it just happens that the FSF currently owns the copyright on
the code.

Another, transitional approach might be to find another Free Software
non-profit and for contributors to start assigning copyright on ongoing
work to that other non-profit.  That way there would be only two
copyright holders on the code; if the FSF somehow survives its current
death-spiral then the other nonprofit could assign copyright back to
the FSF;  if it doesn't, well, we've already got bigger problems.

> And it is really helpful that we don't have to ask permission of
> every
> individual contributor to be able to create the GCC manual (because
> the GPL code and GFDL text could otherwise not be combined) but that
> the FSF can grant an exception to one of the developers to create it.

Alternatively, the copyright holder could relicense the documentation
to a license that is explicitly compatible with the GPL, such as the
GPL itself, and not require us to jump through hoops.  (Or we could
start a non-GFDL body of documentation under a different copyright
holder, but I'm not volunteering for that effort).  In case it's not
clear, I think the GFDL is a terrible license, and that it's always a
mistake to use it for software documentation.

> > Are there any GPL violations involving GCC code
> > that were resolved only because all copyright resides with a single
> > entity, that couldn't have been resolved on behalf of individual
> > copyright holders?
> 
> I think it has been very helpful preventing those violations. If you
> only have individual copyright holders instead of an organisation
> with
> the means to actually resolve such violations people pay much more
> attention to play by the rules. See for example the linux kernel
> project. I believe there are so many GPL violations precisely because
> almost no individual has the means to take up a case.

Again, the "single entity" doesn't need to be the FSF.

> > Are we still worried about BigCorp trying to do a proprietary fork
> > of
> > GCC? Because BigCorp, OtherCorp etc. have shown that they would
> > prefer
> > to create a new toolchain from scratch rather than use GNU code.
> > And
> > if EvilCorp want to make their own proprietary compiler with secret
> > optimizations, they'll just use LLVM instead of bothering to
> > violate
> > the GPL. The work done to make it impossible to steal GCC code was
> > a
> > success: nobody is even interested in stealing it now. There is an
> > easier option.
> 
> I admit that the only way proprietary compiler writers can compete
> with GCC is by producing a lax-permissive licensed compiler is an odd
> way to win for Free Software. 
> But we should still make sure that GCC
> itself makes it so that users can actually get the sources of the
> compiler they are using and not just some sources that might or might
> not correspond to the binary they are using. Making sure that the
> code
> reaches actual users and not just some corporate hackers to create a
> proprietary compiler is what counts IMHO. And using strong copyleft
> and having a shared copyright pool of code held by an entity that can
> enforce that is still necessary IMHO.
> 
> > Can we break our (already weak) ties to GNU?

It's not clear to me to what extent "GNU" is a thing that exists.  I
agree with much of Andy Wingo's October 2019 blog post:
http://www.wingolog.org/archives/2019/10/08/thoughts-on-rms-and-gnu


IMHO, "GNU" can mean various things:
- the small family of "g"-prefixed toolchain/low-leve

Re: GCC association with the FSF

2021-04-07 Thread David Malcolm via Gcc
On Wed, 2021-04-07 at 10:51 -0400, Alfred M. Szmidt via Gcc wrote:
>    [...]  That "gnu-stucture" document was written by RMS a couple of
>    months ago and doesn't represent how the GNU project and its
>    maintainers have worked for years.
> 
> It reflects the same message that has been sent to new GNU
> maintainers
> for the decades. The GNU structure and organization document
> (https://www.gnu.org/gnu/gnu-structure.en.html) is basically a
> reflection of that, and how we have been doing things for decades.

"We've always done it this way" is not necessarily a good defence of an
existing practice.

> You can raise any issues you think do not reflect on the lists, or
> with the GNU Advisory Committee.
> 
>    RMS indeed claims to be the "Chief GNUisance" of the GNU project
> and
>    that that title somehow makes him the leader of the project and
> that
>    he appoints GNU maintainers.
> 
> That is true, RMS appoints which projects become GNU projects or not,
> and who maintains them.  And as maintainers we have a lot of freedom,
> as
> can be seen here, and elsewhere.  

What you're describing sounds like a dictatorship to me.

> 
>    The GNU Assembly is having a similar
>    discussion right now
> 
> It should be noted that this group is not associated with the GNU
> project, or represents it in anyway, despite pretending to.

I don't think you get to speak for who is or is not a member of the GNU
project.  As far as I know, "GNU" isn't trademarked.

My opinions, not my employer's, as usual
Dave



Re: GCC association with the FSF

2021-04-07 Thread David Malcolm via Gcc
On Wed, 2021-04-07 at 18:24 +0200, John Darrington wrote:
> On Wed, Apr 07, 2021 at 11:15:14AM -0400, David Malcolm via Gcc
> wrote:
> 
>  > It reflects the same message that has been sent to new GNU
>  > maintainers
>  > for the decades. The GNU structure and organization document
>  > (https://www.gnu.org/gnu/gnu-structure.en.html) is basically a
>  > reflection of that, and how we have been doing things for
> decades.
>  
>  "We've always done it this way" is not necessarily a good
> defence of an
>  existing practice.
> 
> You are right.  The GNU Structure document doesn't claim to be. It
> just
> documents the way things are.
>  
>  > That is true, RMS appoints which projects become GNU projects
> or not,
>  > and who maintains them.  And as maintainers we have a lot of
> freedom,
>  > as
>  > can be seen here, and elsewhere.  
>  
>  What you're describing sounds like a dictatorship to me.
> 
>  I cannot see how you reach that conclusion.

Having one guy at the top from whom all power flows.

What's the process for replacing the guy at the top, if he's become a
liability to the project?  What would a healthy structure look like?

My opinions, not my employer's, as usual
Dave



Re: [GSoC-2021] Interested in project `Extend the static analysis pass`

2021-04-07 Thread David Malcolm via Gcc
On Wed, 2021-04-07 at 01:59 +0530, Saloni Garg wrote:
> Hi, apologies for the delayed reply. I was having some college
> commitments.
> On Wed, Mar 31, 2021 at 11:22 PM David Malcolm 
> wrote:
> 
> > On Wed, 2021-03-31 at 21:41 +0530, Saloni Garg wrote:
> > > On Tue, Mar 30, 2021 at 6:42 PM David Malcolm <
> > > dmalc...@redhat.com>
> > > wrote:
> > > 
> > > > On Tue, 2021-03-30 at 16:06 +0530, Saloni Garg wrote:
> > > > > On Sun, Mar 28, 2021 at 8:03 PM David Malcolm <
> > > > > dmalc...@redhat.com>
> > > > > wrote:
> > 

[...snip...]

> > > > Please do work on the formal proposal - without it we can't
> > > > accept
> > > > you
> > > > as a GSoC student.  The window for submitting proposals opened
> > > > yesterday, and I believe it closes in a couple of weeks, and
> > > > you
> > > > need
> > > > to do that, so any experimentation you do now should really
> > > > just be
> > > > in
> > > > support of writing a good proposal.  It would be a shame to not
> > > > have a
> > > > good prospective candidate because they didn't allow enough
> > > > time to
> > > > do
> > > > the proper GSoC paperwork before the deadline.
> > > > 
> > > Thanks for understanding. Here is an initial draft (
> > > 
> > > 
> >   
> > https://docs.google.com/document/d/1inkkU5B55s_FOWRzUuf38s7XEet65kc0dW3yFn0yh1M/edit?usp=sharing
> > > )
> > > of my GSoC proposal. I am yet to fill in the missing blocks.
> > > Please, let me know if you have any comments on the document
> > > itself.
> > 
> > Caveat: I'm not familiar with the expected format of such
> > documents.
> > 
> > Looks like a good first draft.
> 
> I don't think there is any such expected format(I checked some
> previous
> years accepted proposals). I believe if we clearly write the expected
> goal
> and the tentative approach to reach it, that would be okay for the
> proposal.

Looking at:
  https://gcc.gnu.org/wiki/SummerOfCode#Application
we don't have a specific format to be followed.

That said, I saw this 
  https://google.github.io/gsocguides/student/writing-a-proposal
which seems to have useful advice to prospective GSoC students.  In
particular, the "Elements of a Quality Proposal" lists various things
that in your current draft are missing, and which would strengthen your
proposal.  So I'd recommend that you (and other prospective GSoC
candidates) have a look at that.

[...snip...]

> > - in Example 2, maybe spell out why it's a leak - when does the
> > allocated buffer stop being referenceable?
> > 
> Explained. Please let me know if you feel it has any loose ends.

I think the leak is actually at line 8, when the "catch" clause ends. 
Isn't the buffer passed into the exception-state of the thread, and
becomes live during the "catch" clause, but stops being live at the end
of the catch clause?

> 
> > - you have a simple example of a false negative; is it possible to
> > give
> > a simple example of a false positive?  (I think "new" is meant to
> > raise
> > an exception if it fails, so a diagnostics about a NULL-deref on
> > unchecked new might be a false positive.  I'm not sure)
> > 
> I tried the following example:
> 
> #include 
> #include 
> using namespace std;
> int *p;
> int* alloc() {
>    return new int;
> }
> void free() {
>    delete p;
> }

FWIW please don't create a top-level function called "free" that isn't
the C stdlib's free, it's confusing!

> int main()
> {
>    try {
>  p = alloc();
>  free();
>    } catch(...) {
>    }
>    return 0;
> }
> Please, have a look here (https://godbolt.org/z/8WvoaP67n). I believe
> it is
> a false positive, I am not sure, please confirm.

It looks like one to me.  I had a look at -fdump-analyzer-exploded-
graph and the false positive seems to be associated with the edge with
the EH flag (to the catch handler).

[...snip...]

> 
> > - "sample example programs": for "sample" did you mean to write
> > "simple" here?
> > 
> By sample examples, I meant the test cases that shall be used to
> prove the
> correctness of the patches during the course.

Isn't "sample examples" a tautology, though?  (or, at least, my
interpretation of "sample" here makes it so, I think).

> 
> > - as well as understanding code, you'll need to understand data,
> > specifically getting a feel for the kinds of control flow graphs
> > that
> > the analyzer is receiving as inputs i.e. what the analyzer "sees"
> > when
> > the user inputs various C++ language constructs; what
> > interprocedural
> > vs intraprocedural raise/try/catch situations look like, etc.
> > 
> I am in the process to understand how the analyzer works. I believe I
> have
> just got a gist of the approach being used. The gimple graph has been
> processed in a Worklist manner. We analyze each statement and add
> predecessors/successors if there is some new information coming out
> of the
> analyzed statement. 

When we process a node in the worklist, we only ever add successor
nodes, recording the next  pairs.

> For example, In the memory leak examples discussed
> her

Re: GCC association with the FSF

2021-04-08 Thread David Malcolm via Gcc
On Thu, 2021-04-08 at 08:45 +0200, John Darrington wrote:
> On Wed, Apr 07, 2021 at 06:34:12PM -0400, David Malcolm wrote:
>  >  
>  >  What you're describing sounds like a dictatorship to me.
>  > 
>  >  I cannot see how you reach that conclusion.
>  
>  Having one guy at the top from whom all power flows.
> 
> Power does not "flow" from RMS.  Since you have used a political
> analogy:
> I think it is more akin to a constitutional monarchy.

I grew up in the UK, and am most familiar with the situation there; I
don't have experience of the Australian system.

>  
>  What's the process for replacing the guy at the top, if he's
> become a
>  liability to the project?  What would a healthy structure look
> like?
> 
> Many countries have a single person as head of state with no formally
> defined process for replacing him or her.   Most of those countries
> are not
> usually descibed as "dictatorships".

It depends on whether the head of state is a mere figurehead, or is
actually in charge.  In the UK, the Queen is nominally in charge of
"her government", but that mostly amounts to merely rubberstamping the
election result, albeit with some limited "soft power" in terms of
gravitas.  I think it remains to be seen if the monarchy will survive
her passing (if indeed the UK is still in its current form at that
point, but that's a whole other can of worms).

> Further, history has shown,  in cases where that head of state has
> been
> forcibly removed (eg France, Russia). the regime that replaced them
> turned
> out to be composed of murderous powermongers concerned with nobody's
> interest
> but their own. 

If we're continuing the political analogy, a counterexample might be
the United States.

>   I for one, will not sit back and let that heppen to GNU.

I think it's important to distinguish between the figurative and
literal here.

No one is literally calling for anyone's head.

Some of us don't want RMS in a leadership position in a project we're
associated with (be it the FSF or GNU, and thus, GCC).

My opinions, not my employer's, as usual.
Dave




Re: GCC association with the FSF

2021-04-08 Thread David Malcolm via Gcc
On Thu, 2021-04-08 at 20:21 +0200, John Darrington wrote:
> On Thu, Apr 08, 2021 at 10:54:25AM -0400, David Malcolm wrote: 

[...]

>  Some of us don't want RMS in a leadership position in a project
> we're
>  associated with (be it the FSF or GNU, and thus, GCC).
> 
> RMS was the first person to be involved in GNU and GCC.  Others
> became
> involved later (under his leadership).  Their contribution was and
> continues to be welcome.  They are also free to stop contributing any
> time they wish to do so.

I intend to continue contributing to GCC (and to Free Software in
general), but RMS is not my leader.

>  
>  My opinions, not my employer's, as usual.
> 
> Then why do you write this from your employer's email?

My employer gives me permission.

>   That is like
> writing it on the company letterhead.

I disagree.

>   I suggest that when speaking
> for yourself you use your own email.

Given the reaction that some have faced for questioning RMS, I'd prefer
to keep that address private.

As before, these are my opinions, not my employer's.

Dave



Re: GCC association with the FSF

2021-04-10 Thread David Malcolm via Gcc
On Sat, 2021-04-10 at 08:17 -0700, Thomas Rodgers wrote:
> On 2021-04-09 14:34, Christopher Dimech wrote:
> 
> > > On the contrary, I eagerly await each and every one of your
> > > missives 
> > > on
> > > this topic, hoping for exactly that very  thing to occur.

[...]

> On 2021-04-10 07:49, Christopher Dimech via Gcc wrote:
> 
> > 
> > 
> > Should we get our ideas from politicians and bureaucrats; or from 
> > Aleksandr
> > Solzhenitsyn, Fyodor Dostoyevsky, Friedrich Nietzsche, Ernest 
> > Hemingway,
> > Aldous Huxley, Marie-Henri Beyle, and Emily Jane Brontë?  From the 
> > latter
> > of course!
> 
> So, that's a solid 'no' on the likelihood of you contributing
> anything 
> of value
> to the discussion of GCC governance then?

Thomas, please don't feed the troll.

Hope this is constructive
Dave




Re: [GSoC-2021] Interested in project `Extend the static analysis pass`

2021-04-10 Thread David Malcolm via Gcc
On Sat, 2021-04-10 at 21:18 +0530, Saloni Garg wrote:
> On Thu, Apr 8, 2021 at 8:19 AM David Malcolm 
> wrote:
> 
> > On Wed, 2021-04-07 at 01:59 +0530, Saloni Garg wrote:

[...]

> > Looking at:
> >   https://gcc.gnu.org/wiki/SummerOfCode#Application
> > we don't have a specific format to be followed.
> > 
> > That said, I saw this
> >   https://google.github.io/gsocguides/student/writing-a-proposal
> > which seems to have useful advice to prospective GSoC students.  In
> > particular, the "Elements of a Quality Proposal" lists various
> > things
> > that in your current draft are missing, and which would strengthen
> > your
> > proposal.  So I'd recommend that you (and other prospective GSoC
> > candidates) have a look at that.
> > 
> Added some new sections. Tried to explain them as well. There are
> some
> things I am not clear about, so explicitly mentioned them and will
> add the
> relevant explanations and present them in the later reports. Please
> let me
> know if this sounds good to you and provide feedback as well.

The updated version looks a lot stronger.

That said, you haven't given details of your programming expertise - in
particular this project will require proficiency with C++, so a good
application would give evidence to the reader that you're already up-
to-speed on writing and debugging C++ (see the "Biographical
Information" section in the guide I linked to above for more info).


[...snip...]

> > FWIW please don't create a top-level function called "free" that
> > isn't
> > the C stdlib's free, it's confusing!
> > 
> Sorry, my bad renamed it to `myfree`.

Thanks!

> > 
> > > int main()
> > > {
> > >    try {
> > >  p = alloc();
> > >  free();
> > >    } catch(...) {
> > >    }
> > >    return 0;
> > > }
> > > Please, have a look here (https://godbolt.org/z/8WvoaP67n). I
> > > believe
> > > it is
> > > a false positive, I am not sure, please confirm.
> > 
> > It looks like one to me.  I had a look at -fdump-analyzer-exploded-
> > graph and the false positive seems to be associated with the edge
> > with
> > the EH flag (to the catch handler).
> > 
> I have understood the exploded graph but not able to understand the
> `EH
> flag` point you are making, so I will get back to you on this.

Edges in control flow graphs can have flags; see the various
DEF_EDGE_FLAG in gcc/cfg-flags.def in the source tree, and in
particular the "EH" flag.

These flags are visible in the analyzer's supergraph - they should
appear in the .dot dump files from the analyzer - so sometimes they're
difficult to see, depending on how GraphViz lays things out.  (FWIW I
use:
  https://github.com/jrfonseca/xdot.py
to view the .dot files; it's fast and convenient)

[...snip...]

> 
> > > 
> > 
> > Currently the analyzer has a "brute force" approach to
> > interprocedural
> > analysis, and attempts to simulate the calls and returns in a
> > fairly
> > direct way.  It's crude (and has exponential growth), but is
> > reasonably
> > simple conceptually (or at least I think so).  The analyzer
> > implements
> > setjmp/longjmp in a similar way, and exception-handling could be
> > based
> > on that code.
> > 
> Going through that already and your comments at the start of every
> data
> structure defined are really helpful.

Thanks!

> 
[...snip...]

Dave



Re: GCC association with the FSF

2021-04-11 Thread David Malcolm via Gcc
On Sun, 2021-04-11 at 14:07 +0100, Frosku wrote:
> On Sun Apr 11, 2021 at 11:08 AM BST, Didier Kryn wrote:
> > Le 08/04/2021 à 17:00, David Brown a écrit :
> > > At some point, someone in the public relations
> > > department at IBM, Google, Facebook, ARM, or other big supporters
> > > of the
> > > project will get the impression that the FSF and GNU are lead by
> > > a
> > > misogynist who thinks child abuse is fine if the child consents,
> > > and
> > > will cut off all support from the top down.  The other companies
> > > will
> > > immediately follow. 
> > 
> > Here we are. The liberty of expressing opinions is too much of a
> > liberty. This is ironical to read in a mailing list dedicated in
> > some to
> > a free software project.
> 
> He's actually recanted his views about 'consensual pedophilia', which
> is
> testament to the benefits of open dialogue. 

Wow.  Just... wow.

I've been trying to ignore this thread for the sake of my mental health
- it's been going on for 2 weeks now - but I feel I have to speak up
about how wrong-headed the above seems to me.

I don't want to be in an environment where, it turns out, the leader of
the non-profit that owns copyright on the bulk of the last 8 years of
my work, and controls the license on the bulk of my work for the last
20 years, has to be patiently coached in why pedophilia is bad.  Most
reasonable people would run a mile from such an environment.  Think of
what the FSF could have achieved if RMS hadn't driven away all but the
most patient and dedicated people, and the effort exhausted by those
that remain on enabling [1] him to continue in his "leadership" role.

At one time, RMS was a hero and inspiration to me; I remember cutting
out newspaper articles about him when I was in school, and I own a copy
of his book, which he signed for me.  However, that book has been in my
attic for a while now, gathering dust, which seems symbolic to me.

I hope that the FSF can be saved; it would be deeply damaging to
software freedom for it to finish imploding.  It would also be very
inconvenient for those of us trying to improve GCC.

For those with ears to listen, Luis Villa posted this excellent
article, with plenty of ideas on how to save the FSF:
  https://lu.is/blog/2021/04/07/values-centered-npos-with-kmaher/
which I'll quote part of here:

"Many in the GNU and FSF communities seem to worry that moving past RMS
somehow means abandoning software freedom, which should not be the
case. If anything, this should be an opportunity to re-commit to
software freedom - in a way that is relevant and actionable given the
state of the software industry in 2021."

In the meantime, I don't know what GCC should do, but I feel like I
need to go for a walk in the woods to clear my head, away from a
keyboard, rather than spending any more of my weekend stressing about
the project.

I hope this is constructive.  These are my opinions, and not
necessarily those of my employer - though Red Hat has stated that it is
"appalled" at RMS's return to the FSF board [2], and part of my job is
to care about the future of GCC.

Dave

[1] see e.g. https://www.healthline.com/health/enabler
[2] 
https://www.redhat.com/en/blog/red-hat-statement-about-richard-stallmans-return-free-software-foundation-board



Re: [GSoC-2021] Interested in project `Extend the static analysis pass`

2021-04-12 Thread David Malcolm via Gcc
On Sun, 2021-04-11 at 17:06 +0530, Saloni Garg wrote:
> On Sun, Apr 11, 2021 at 12:14 AM David Malcolm 
> wrote:
> 
> > On Sat, 2021-04-10 at 21:18 +0530, Saloni Garg wrote:
> > > On Thu, Apr 8, 2021 at 8:19 AM David Malcolm
> > > 
> > > wrote:
> > > 
> > > > On Wed, 2021-04-07 at 01:59 +0530, Saloni Garg wrote:
> > 
> > [...]
> > 
> > > > Looking at:
> > > >   https://gcc.gnu.org/wiki/SummerOfCode#Application
> > > > we don't have a specific format to be followed.
> > > > 
> > > > That said, I saw this
> > > >     
> > > > https://google.github.io/gsocguides/student/writing-a-proposal
> > > > which seems to have useful advice to prospective GSoC
> > > > students.  In
> > > > particular, the "Elements of a Quality Proposal" lists various
> > > > things
> > > > that in your current draft are missing, and which would
> > > > strengthen
> > > > your
> > > > proposal.  So I'd recommend that you (and other prospective
> > > > GSoC
> > > > candidates) have a look at that.
> > > > 
> > > Added some new sections. Tried to explain them as well. There are
> > > some
> > > things I am not clear about, so explicitly mentioned them and
> > > will
> > > add the
> > > relevant explanations and present them in the later reports.
> > > Please
> > > let me
> > > know if this sounds good to you and provide feedback as well.
> > 
> > The updated version looks a lot stronger.
> > 
> Hi, Thanks for the quick feedback.
> 
> > 
> > That said, you haven't given details of your programming expertise
> > - in
> > particular this project will require proficiency with C++, so a
> > good
> > application would give evidence to the reader that you're already
> > up-
> > to-speed on writing and debugging C++ (see the "Biographical
> > Information" section in the guide I linked to above for more info).
> > 
> Apologies, but I am a beginner in this area of compilers and static
> analysis. I already know some C++ coding which I have used mostly in
> Competitive coding competitions. I have been following this(
> https://www.cse.iitk.ac.in/users/karkare/Courses/cs618/) course to
> understand the nuances of the static analysis. I am confident that I
> can
> write the C++ code that is required here and know how to use tools
> like
> GDB, Valgrind to debug the C++ codes, 

Your proposal would benefit from including something like the above.

We're not expecting Bjarne Stroustrup levels of competence in C++,
especially considering that you are all students - but we need some
ability in C++... which you may already have, it's hard for me to tell
as the current draft proposal is written.  Part of the point of GSoC is
to learn, but to learn about the specifics of the FLOSS project you
apply to [1], rather than the implementation language.

> but I don't have any good projects to
> prove that right now. My college got stopped due to COVID-19 and
> hasn't
> started yet properly, so I have been trying to learn most of the
> things
> online only.
> I hope you understand.

Indeed.


Hope this is helpful; good luck (the deadline to apply is fast
approaching)

Dave

[1] and to learn about what "real" programming is like (for some
definition of "real"), as opposed to the rather artifical programming
that university coursework tends to be like.  For example, GCC has 30+
years of legacy code to maintain, full of weird specialcases and dark
corners, with dozens of target configurations - we can't just rewrite
it all, or at least, not quickly :)



Re: On rms controversy

2021-04-14 Thread David Malcolm via Gcc
On Wed, 2021-04-14 at 08:01 +0100, Jonathan Wakely via Gcc wrote:
> On Wed, 14 Apr 2021, 07:50 pawel k. via Gcc,  wrote:
> 
> 

[...snip...]

> Very logical argument, thanks for sharing.

Jonathan, it's clear to me that you're being sarcastic, but it might
not be clear to others.  Please avoid sarcasm - it amplifies
misunderstanding on the internet, and it seems to me that that's
counterproductive when discussing a sensitive topic.

I disagree with much of what Pawel wrote, but this discussion has left
me feeling drained.  I don't have the mental energy to do a point-by-
point response, and I don't think it would be a productive thing to
post to this mailing list.

We all care about GCC, and we're all human.

Dave




Re: removing toxic emailers

2021-04-15 Thread David Malcolm via Gcc
On Thu, 2021-04-15 at 09:49 -0400, Eric S. Raymond wrote:
> Joseph Myers :
> > On Wed, 14 Apr 2021, Eric S. Raymond wrote:
> > 
> > > I'm not judging RMS's behavior (or anyone else's) one way or
> > > another. I am simply pointing out that there is a Schelling point
> > > in
> > > possible community norms that is well expressed as "you shall judge
> > > by
> > > the code alone".  This list is not full of contention from
> > > affirming
> > > that norm, but from some peoples' attempt to repudiate it.
> > 
> > Since RMS, FSF and GNU are not contributing code to the toolchain and
> > haven't been for a very long time, the most similar basis to judge
> > them 
> > would seem to be based on their interactions with toolchain
> > development.  
> > I think those interactions generally show that FSF and GNU have been
> > bad 
> > umbrella organizations for the toolchain since at least when the GCC
> > 4.4 
> > release was delayed waiting for a slow process of developing the GCC 
> > Runtime Library Exception.
> 
> I do not have standing to argue this point.
> 
> I will, however, point out that it is a very *different* point from
> "RMS has iupset some people and should therefore be canceled".

[I'm sorry to everyone who's sick of these threads, but I feel I have
to respond to this one; sorry about writing another long email]

Eric: I don't know if you're just being glib, or you're deliberately
trying to caricature those of us who are upset by RMS's behavior.

I think the words "canceled" and "cancel culture" have effectively
become meaningless and should be avoided if we want to have a nuanced
discussion - no-one seems to have a definition of what counts as
"canceling" vs "consequences" vs "fair and measured responses".

At one time, both you and RMS were heroes of mine, and I was a true
believer (of what, I'm no longer sure); I own copies of both "The
Cathedral and the Bazaar" and "Free Software - Free Society", though
both are currently in my attic, gathering dust.

I've long felt that there was a massive hole in the GNU project and FSF
where effective technical leadership should have been - various
maintainers on gcc, gdb, etc have been implementing things, and things
were humming along, and those of us in Red Hat working on them tried to
coordinate on features we felt were important - but where was the top-
level response to, say, LLVM/clang? (to name just one of many changes
in the industry)  In many ways the last 8 years of my career have been
an attempt to get gcc to respond to the appearance of LLVM/clang (I've
added JIT-compilation, improved diagnostics, and I'm implementing a
static analysis pass) - I'm lucky that my managers inside Red Hat are
happy to pay me to hack on this stuff and make GCC better - it helps
our customers, but it also helps GCC, and the broader FLOSS communities
using both toolchains).

Where has the technical leadership from RMS been?  Instead the long-
standing opposition by RMS to exposing the compiler's IR has hobbled
GCC, and partly contributed to the pile of technical debt we have to
dig our way out of.  The only "leadership" coming out of GNU/FSF seem
to me to be dictats from on high about ChangeLog formats and coding
conventions.  The GNU project seems to me to be stuck in the 1980s. 
Perhaps a pronouncement like: "try to make everything be consumable as
libraries with APIs, as well as as standalone binaries" might have
helped (and still could; can we do that please?)

Similarly, I agree with Joseph's observations of the ways that the FSF
and GNU have been bad umbrella organizations for the toolchain.

But beyond the failure of technical leadership, and the organizational
incompetence/incoherence, is RMS's behavior, and the extent to which
it, as you put it "upset some people".

RMS's defenders seem to have fixated on his 2019 comments on Marvin
Minsky, the uproar over those, and his responses to them (then and
recently), and seem keen to assure us that everything's OK now, or, at
least on a road to improvement.

But in the time since those 2019 comments, I've been reconsidering my
views on RMS.  In particular, I have read of many alleged incidents
such as:
 - spontaneously licking a female conference member on the arm
 - appearing to hit on anyone female, even if they're underage
 - asking which female audience members at his talk were virgins

At least one of the above was from a former colleage of mine, which
when I read it was about the point that broke me.

As part of my reconsidering my views on RMS, I recalled an event
described in Sam Williams' biography of RMS in which Williams describes
RMS's then girlfriend talking about how she "admired the way Richard
built up an entire political movement to address an issue of profound
personal concern", which she identified as "crushing loneliness".

When I first read that, years ago, I felt sorry and pity for RMS, and a
vague feeling that community is an important part of FLOSS, or somesuch
sentiment (and a feeling of trying t

Re: removing toxic emailers

2021-04-15 Thread David Malcolm via Gcc
On Thu, 2021-04-15 at 16:26 -0400, Chris Punches wrote:
> What I see here in sum is another high level tightly integrated Red
> Hat
> employee saying the gist of "I'm really not saying it out of my
> employer's interest and it has nothing to do with my personal
> feelings".

I'm not sure I'm "high level", but I guess I'll take that as a
compliment.

I stated that the opinions in my screed were my own, but I'm a former
FLOSS enthusiast in the fortunate position of being paid to work on
GCC.  I've tried to be open about my biases.

> 
> Every single proponent of this argument that I have seen so far is
> employed by one of the same 5 companies and "really isn't doing it on
> behalf of my company I swear".  
> 
> Why is it almost exclusively that specific crowd saying it here,
> then?

Because, sadly, there's only a small group of companies that employ GCC
developers.  These developers tend to have an emotional attachment to
the project (e.g. a broad agreement with the professed goals of the
FSF).  Part of the reason I work at Red Hat is that its own internal
culture aligns with mine, much of the time, anyway (and I know we're
not perfect).

Hence there's some correlation between those with strong opinions on
the project and those who are being paid to work on it.  I don't see
that as malicious or a conspiracy - just that we, reasonably, care
about the work we do and its context.  It's not necessarily just a job
for me.

> 
> I just don't buy it.  Please say anything that would not support the
> emerging theory that these companies are using integrated employees
> to
> try to emulate justification/pretext for a rift to attack the free
> software world.  Anything at all.

I hope I just did.

Dave




Re: Gcc as callable libraries (was: removing toxic emailers)

2021-04-15 Thread David Malcolm via Gcc
On Thu, 2021-04-15 at 21:48 +0200, Thomas Koenig wrote:
> David,
> 
> for some reason or other, I did not get your mail, so I will
> just reply copying in from the archive.
> 
> First, thanks for injecting some sanity into the discussion.

Thanks Thomas

> I will not discuss RMS' personal shortcomings or the lack of them.
> In today's toxic political climate, such allegations are often
> made up and weaponized without an effective defense for the
> alleged wrongdoer.  I don't know the truth of the matter, and I make
> a point of not finding out.

Fair enough.

>  > In many ways the last 8 years of my career have been
>  > an attempt to get gcc to respond to the appearance of LLVM/clang
> (I've
>  > added JIT-compilation, improved diagnostics, and I'm implementing
> a
>  > static analysis pass)
> 
> And this is highly welcome, and has made gcc (including gfortran) a
> much
> better compiler.  I well remember how you implemented the much better
> colored error messages that gfortran has now.

I've added a bunch of features to the C and C++ frontends (underlined
ranges, labelling of such reanges, fix-it hints, etc), but I don't have
the Fortran skills to know what would be appropriate to gfortran.  Let
me know if you have ideas for specific improvements to how gfortran
diagnostics work that I might be able to help implement.

> 
>  > Perhaps a pronouncement like: "try to make everything be
> consumable as
>  > libraries with APIs, as well as as standalone binaries" might have
>  > helped (and still could; can we do that please?)
> 
> That makes perfect sense, as LLVM shows, and is something that the
> steering committee could decide for the project (or rather, it could
> issue a pronouncement that this will not be opposed if some volunteer
> does it).
> 
> I think this could be as close to an unanimous decision as there can
> be among such a diverse community as the gcc developers.  If the FSF
> takes umbrage at this, the ball is in their court.

I deliberately added the weasel-words "try to", because these things
are, of course, much easier said that done.

I attempted to reduce gcc's use of global state back in 2013 with a
view to making it a shared library, but eventually the sheer size of
the task overwhelmed me.  In libgccjit I hid everything behind a
separate API, with a bug mutex guarding all of gcc's global state,
which feels like something of a cop-out.

One idea I had would be to refactor out our diagnostics code into a
libdiagnostics (or similar), so that all of the source-
printing/underlining/fix-it logic etc could be used outside of gcc, and
the use of diagnostic_context might help towards that.  But even "just"
that's decidedly non-trivial.

Hope this is constructive
Dave




Re: Gcc as callable libraries (was: removing toxic emailers)

2021-04-15 Thread David Malcolm via Gcc
On Thu, 2021-04-15 at 17:31 -0400, David Malcolm via Gcc wrote:
> On Thu, 2021-04-15 at 21:48 +0200, Thomas Koenig wrote:

[...snip...]

> >  > Perhaps a pronouncement like: "try to make everything be
> > consumable as
> >  > libraries with APIs, as well as as standalone binaries" might
> > have
> >  > helped (and still could; can we do that please?)
> > 
> > That makes perfect sense, as LLVM shows, and is something that the
> > steering committee could decide for the project (or rather, it
> > could
> > issue a pronouncement that this will not be opposed if some
> > volunteer
> > does it).
> > 
> > I think this could be as close to an unanimous decision as there
> > can
> > be among such a diverse community as the gcc developers.  If the
> > FSF
> > takes umbrage at this, the ball is in their court.
> 
> I deliberately added the weasel-words "try to", because these things
> are, of course, much easier said that done.
> 
> I attempted to reduce gcc's use of global state back in 2013 with a
> view to making it a shared library, but eventually the sheer size of
> the task overwhelmed me.  In libgccjit I hid everything behind a
> separate API, with a bug mutex guarding all of gcc's global state,
   ~~~
   big, I meant to write.

> which feels like something of a cop-out.

libgccjit calls into as and ld, which shows up in the profile, so
another idea I dabbled in the whole "libraries rather than just
executables" area is to make as and ld buildable as shared libraries;
hence this 2015 experiment:

"[PATCH 00/16] RFC: Embedding as and ld inside gcc driver and into
libgccjit"
crossposted between gcc-patches and binutils here:
  https://gcc.gnu.org/legacy-ml/gcc-patches/2015-06/msg00116.html
  https://sourceware.org/legacy-ml/binutils/2015-06/msg00010.html

(admittedly my prototype had a barely-existent API, but it gave me a 5x
speedup on a synthetic benchmark, which was dominated by the overhead
of dynamically linking libbfd into as and ld on each invocation IIRC;
better to do it once when libgccjit is linked into the process).

> 
> One idea I had would be to refactor out our diagnostics code into a
> libdiagnostics (or similar), so that all of the source-
> printing/underlining/fix-it logic etc could be used outside of gcc, and
> the use of diagnostic_context might help towards that.  But even "just"
> that's decidedly non-trivial.
> 
> Hope this is constructive
> Dave
> 
> 




Re: removing toxic emailers

2021-04-18 Thread David Malcolm via Gcc
On Sun, 2021-04-18 at 09:10 -0400, Eric S. Raymond wrote:

Sorry for prolonging this thread-of-doom; I'm loathe to reply to Eric
because I worry that it will encourage him.  I wrote a long rebuttal to
his last email to me about his great insights into the minds of women
but didn't send it in the hope of reducing the temperature of the
conversation.

That said...

> Ian Lance Taylor via Gcc :
> > This conversation has moved well off-topic for the GCC mailing lists.
> > 
> > Some of the posts here do not follow the GNU Kind Communication
> > Guidelines
> > (https://www.gnu.org/philosophy/kind-communication.en.html).
> > 
> > I suggest that people who want to continue this thread take it off
> > the
> > GCC mailing list.
> > 
> > Thanks.
> > 
> > Ian
> 
> Welcome to the consequences of abandoning "You shall judge by the code
> alone."
> 
> This is what it will be like, *forever*, until you reassert that norm.

Or we could ignore the false dilemma that Eric is asserting, and
instead moderate the list, or even just moderate those who have never
contributed to GCC but persist in emailing the list.

Personally, I've been moving all posts by Christopher Dimech to this
list direct from my inbox to my archive without reading them for the
last several days, and it's helped my mood considerably.  He's been
prolifically posting to the list recently, but in the 8 years I've been
involved in gcc development I've never heard of him before this thing
kicked off, and the stuff I've had the misfortune to see by him appears
to me to be full of conspiracy theories and deranged raving.  The clue
might have been when he referred to us as "bitches".

"Don't feed the trolls" might have worked once, but sometimes they
start talking to each other, and it becomes difficult for a bystander
to tell that everyone else is ignoring them, and it keeps threads like
this one alive.

I reject the idea that those of us who work on GCC have to put up with
arbitrary emails from random crazies on the internet without even the
simple recourse of being able to put individuals on moderation.  That
might have worked 20 years ago when I thought ESR was relevant, but
seems absurdly out-of-date to me today.

As usual, these are my opinions only, not necessarily those of my
employer

Dave




Re: "musttail" statement attribute for GCC?

2021-04-23 Thread David Malcolm via Gcc
On Fri, 2021-04-23 at 12:44 -0700, Josh Haberman via Gcc wrote:
> Would it be feasible to implement a "musttail" statement attribute in
> GCC to get a guarantee that tail call optimization will be performed?
> 
> Such an attribute has just landed in the trunk of Clang
> (https://reviews.llvm.org/D99517). It makes it possible to write
> algorithms that use arbitrarily long chains of tail calls without risk
> of blowing the stack. I would love to see something like this land in
> GCC also (ultimately I'd love to see it standardized).


FWIW I implemented something like this in GCC's middle-end here:
  
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9a385c2d3d74ffed78f2ed9ad47b844d2f294105
exposing it in API form for libgccjit:
  
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=15c671a79ca66df5b1de70dd1a0b78414fe003ef
 
https://gcc.gnu.org/onlinedocs/jit/topics/expressions.html#c.gcc_jit_rvalue_set_bool_require_tail_call
but IIRC it's not yet exposed to the regular GCC frontends.

Dave

> 
> I wrote more about some motivation for guaranteed tail calls here:
>  
> https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html
> 
> GCC successfully optimizes tail calls in many cases already. What
> would it take to provide an actual guarantee, and make it apply to
> non-optimized builds also?
> The Clang implementation enforces several rules that must hold for the
> attribute to be correct, including:
> 
> - It must be on a function call that is tail position.
> - Caller and callee must have compatible function signatures
> (including the implicit "this", if any), differing only in cv
> qualifiers.
> - Caller and callee must use the same calling convention.
> - Caller and callee may not be constructors or destructors.
> - All arguments, the return type, and any temporaries created must be
> trivially destructible.
> - All variables currently in scope must be trivially destructible.
> - The caller cannot be in a try block, an Objective-C block, or a
> statement expression.
> 
> Some of these restrictions may be overly strict for some calling
> conventions, but they are useful as the "least common denominator"
> that should be safe in all cases. When implementing this in Clang, we
> found that we were able to reuse some of the checks around goto and
> asm goto, since a tail call is sort of like a goto out of the
> function's scope.
> 
> Thanks,
> Josh





Re: Some really strange GIMPLE

2021-04-27 Thread David Malcolm via Gcc
On Tue, 2021-04-27 at 20:10 +, Gary Oblock via Gcc wrote:
> I'm chasing a bug and I used Creduce to produce a
> reduced test case. However, that's really beside to
> point.
> 
> I this file:
> 
> typedef struct basket {
> } a;
> long b;
> a *basket;
> int d, c, e;
> a *flake[2];
> void primal_bea_mpp();
> void primal_net_simplex() {
>   flake[1] = &basket[1];
>   primal_bea_mpp(d, d, d, b, flake, 0, e, c, c, d);
> }
> 
> Produces this GIMPLE:
> -
> ;; Function primal_net_simplex (primal_net_simplex, funcdef_no=3,
> decl_uid=4447, cgraph_uid=16, symbol_order=41) (executed once)
> 
> primal_net_simplex ()
> {
>    [local count: 1073741824]:
>   _1 = basket;
>   static struct a * flake[2];
> struct a *[2]
>   flake[1] = _1;
>   _2 = d;
>   _3 = c;
>   _4 = e;
>   _5 = b;
>   primal_bea_mpp (_2, _2, _2, _5, &flake, 0, _4, _3, _3, _2);
>   return;
> 
> }
> --
> These standard calls were used to dump this:
> 
>   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY ( node)
>   {
>     struct function *func = DECL_STRUCT_FUNCTION ( node->decl);
>     dump_function_header ( file, func->decl, (dump_flags_t)0);
>     dump_function_to_file ( func->decl, file, (dump_flags_t)0);
>   }
> 
> The GIMPLE above looks malformed to me. Is that the case
> or am I not grasping what's going on here?

What about it looks malformed to you?

The declaration of primal_bea_mpp and primal_net_simplex could probably
use some parameters, rather than being empty, which might make things
look more sane [1].  I think -Wstrict-prototypes will catch this.

Dave

[1] 
https://wiki.sei.cmu.edu/confluence/display/c/DCL20-C.+Explicitly+specify+void+when+a+function+accepts+no+arguments



RFC: attributes for marking security boundaries (system calls/ioctls, user vs kernel pointers etc)

2021-04-29 Thread David Malcolm via Gcc
I've been going through old Linux kernel CVEs, trying to prototype some
possible new warnings for -fanalyzer in GCC 12 (and, alas, finding
places where the analyzer internals need work...)

I think I want a way for the user to be able to mark security
boundaries in their code: for example:
* in the Linux kernel the boundary between untrusted user-space data
and kernel data, or,
* for a user-space daemon, the boundary between data coming from the
network and the data of daemon itself

The analyzer could then make use of this, for example:

(a) marking untrusted incoming data as "tainted" and prioritizing
analysis of paths that make use of it (e.g. a "might overflow a buffer
when N is really large" goes from being a noisy false positive when we
simply have no knowledge of N (or the buffer's size) to being a serious
issue if N is under the control of an attacker

(b) copying uninitialized data back to the untrusted region becomes a
potential disclosure of sensitive information

I think I also want a way to mark system calls and ioctl
implementations, so that I mark all of the parameters as being
potentially hostile.

Specifically, the Linux kernel uses functions like this:
  #define __user
  extern long copy_to_user(void __user *to, const void *from, unsigned long n);
  extern long copy_from_user(void *to, const void __user *from, long n);

in various places, so I want a way to mark the "to" and "from" params
as being a security boundary.

I've been experimenting with implementing (b) for CVE-2011-1078 (in
which a copy_to_user is passed a pointer to an on-stack buffer that
isn't fully initialized, hence a disclosure of information to user-
space).

Martin: I believe you added __attribute__((access)) in GCC 9.

I was thinking of extending it to allow something like:

#define __user
extern long copy_to_user(void __user *to, const void *from, unsigned long n)
  __attribute__((access (untrusted_write, 1, 3),
 access (read_only, 2, 3)
 ));

extern long copy_from_user(void *to, const void __user *from, long n)
  __attribute__((access (write_only, 1, 3),
 access (untrusted_read, 2, 3)
 ));

so that to "to" and "from" and marked as being writes and reads of up
to size n, but they are flagged as "untrusted" as appropriate, so the
analyzer can pay particular attention as described above.

Does the above idea sound like a sane extension of the access
attribute?

I tried implementing it, but "access" seems to get converted to its own
microformat for expressing these things as strings (created via
append_access_attr, and parsed in e.g. init_attr_rdwr_indices), which
seems to make it much harder than I was expecting.

Any thoughts about how to mark system calls/ioctls?  The simplest would
be an attribute that marks all parameters as being untrusted, and the
return value, somehow.

Thanks
Dave



Re: progress update after initial GSoC virtual meetup

2021-06-01 Thread David Malcolm via Gcc
On Sun, 2021-05-30 at 20:38 +0530, Ankur Saini wrote:
> hello 

Hi Ankur, sorry about the delayed reply (it was a long weekend here in
the US)

> I was successfully able to build gcc with bootstrapping disabled and
> using xgcc directly from the build directory instead ( reducing the
> overall build time a lot, although it still takes about half an hour to
> build but it’s much faster than before ). 

Excellent.


> Also I was also able to run one single test on the built compiler.

Great.

> 
> Is there anything else I should be knowing to aid in development or
> should we start planing and preparing towards the project so that we
> can have a head start during coding phase ?

I tried brainstorming "what does a new contributor to GCC's analyzer
need to be able to do"; here's what I came up with:

- able to build the analyzer from source *quickly*, for hacking on the
code.  i.e. with --disable-bootstrap.  We want to minimize the time it
takes to, say, hack in a print statement into a single .cc file in the
analyzer subdirectory, rebuild, and rerun.   With bootstrapping
disabled, if you run "make -jsome-number-of-cores" from the build
directory's "gcc" subdirectory, it should merely rebuild the .o file
for the .cc you touched, and do some relinking (and rerun the
selftests); hopefully such an edit should take less than a minute
before you're able to run the code and see the results.

It sounds like you're close to being able to do that.

(FWIW I tend to use gdb rather than putting in print statements, I tend
to hack in gcc_unreachable into conditions that I hope are being hit,
so that execution will stop at that point in gdb if my assumptions are
correct, and then I can print things, inject calls, etc in gdb)

- able to build the analyzer with a full bootstrap (--enable-bootstrap
is the default) and running the regression test suites ("make check -
jnumber-of-cores"),  On the fastest box I have this (64 cores, 128 GB
ram) this takes about 45 minutes to do the build and about 45 minutes
to do the testsuites; it used to take up to three hours total when I
was running it on a laptop (and thus was a major pain as it's no fun to
have a hot noisy laptop for several hours).  Maybe it's best to have an
account on the GCC compile farm for this:
  https://gcc.gnu.org/wiki/CompileFarm
IIRC you already have such an account.  It might be worth trying out a
full bootstrap and testsuite run on one of the powerful machines in the
farm.   I tend to use "screen" in case my ssh connection drops during
through a build, so that losing the ssh connection doesn't kill the
build.

- able to step through the code in the debugger.  IIRC you've already
been doing that.

- copyright assignment paperwork to the FSF.  IIRC you've already done
that.

- ability to run just a single test in the testsuite, rather than the
whole lot (so that you can easily develop new tests without having to
run everything each time you make an edit to a test).  As you say
above, you've done that.

- the analyzer has testcases for C, C++ and Fortran, so you might want
to figure out the argument you need for --enable-languages= when
configuring GCC to enable those languages (but probably no others when
hacking, to speed of rebuilding GCC).  Obviously you'll need C++, as
C++ support is the point of your project.

- it might be good to create a personal branch on the gcc git
repository that you can push your work to.  I'm in two minds about
this, in that ideally you'd just commit your work to trunk once each
patch is approved, but maybe it's good to have a public place as a
backup of the "under development" stuff?  Also, at some point we want
you to be pushing changes to the trunk, so we'll want your account to
be able to do that.

I hope all the above makes sense.  Don't hesitate to ask questions;
finding things out is the whole point of this part of the GSoC
schedule.

Can anyone think of something else that's worth sorting out in this
preliminary phase?

I don't think you're meant to be spending more than an hour or so a
week in this preliminary phase until the coding period officially
starts on Monday 7th.

If you're *really* eager to start, you might want to look at 
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546
This is a case where the analyzer "sees" a call through a function
pointer, and, despite figuring out what the function pointer actually
points to, entirely fails to properly handle the call, since the
supergraph and engine.cc code is looking at the static callgraph, and
needs work to handle such calls through function pointers.  I started
debugging this a couple of weeks ago, and realized it has a *lot* of
similarities to the vtable case, so thought I might leave it so you can
have a go at it once the project starts properly.  That said, before
the 7th you're meant to be focusing on schoolwork, I think, so we
really ought to be merely just sorting out accounts, ensuring your
coding environment is set up, etc.

Hope this is helpful

Dave



[PATCH] MAINTAINERS: create DCO section; add myself to it

2021-06-01 Thread David Malcolm via Gcc
On Tue, 2021-06-01 at 10:00 -0400, David Edelsohn via Gcc wrote:
> GCC was created as part of the GNU Project but has grown to operate
> as
> an autonomous project.
> 
> The GCC Steering Committee has decided to relax the requirement to
> assign copyright for all changes to the Free Software Foundation. 
> GCC
> will continue to be developed, distributed, and licensed under the
> GNU
> General Public License v3.0. GCC will now accept contributions with
> or
> without an FSF copyright assignment. This change is consistent with
> the practices of many other major Free Software projects, such as the
> Linux kernel.
> 
> Contributors who have an FSF Copyright Assignment don't need to
> change anything.  Contributors who wish to utilize the Developer
> Certificate
> of Origin[1] should add a Signed-off-by message to their commit
> messages.
> Developers with commit access may add their name to the DCO list in
> the
> MAINTAINERS file to certify the DCO for all future commits in lieu of
> individual
> Signed-off-by messages for each commit.
> 
> The GCC Steering Committee continues to affirm the principles of Free
> Software, and that will never change.
> 
> - The GCC Steering Committee
> 
> [1] https://developercertificate.org/
> 

The MAINTAINERS file doesn't seem to have such a "DCO list"
yet; does the following patch look like what you had in mind?

ChangeLog

* MAINTAINERS: Create DCO section; add myself to it.
---
 MAINTAINERS | 12 
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index db25583b37b..1148e0915cf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -685,3 +685,15 @@ Josef Zlomek   

 James Dennett  
 Christian Ehrhardt 
 Dara Hazeghi   
+
+
+DCO
+===
+
+Developers with commit access may add their name to the following list
+to certify the DCO (https://developercertificate.org/) for all
+future commits in lieu of individual Signed-off-by messages for each commit.
+
+   DCO list(last name alphabetical order)
+
+David Malcolm  
-- 
2.26.3



Re: progress update after initial GSoC virtual meetup

2021-06-08 Thread David Malcolm via Gcc
On Tue, 2021-06-08 at 21:20 +0530, Ankur Saini wrote:
> 
> 
> > On 01-Jun-2021, at 6:38 PM, David Malcolm 
> > wrote:
> > 

[...snip...]

> > Maybe it's best to have an
> > account on the GCC compile farm for this:
> >  https://gcc.gnu.org/wiki/CompileFarm
> > IIRC you already have such an account.  It might be worth trying
> > out a
> > full bootstrap and testsuite run on one of the powerful machines in
> > the
> > farm.   I tend to use "screen" in case my ssh connection drops
> > during
> > through a build, so that losing the ssh connection doesn't kill the
> > build.
> 
> I tried this, and it’s awesome :D , I was able to complete the either
> bootstrap build on one of the powerful machine there with almost
> similar time to that of what my laptop takes to build with bootstrap
> disabled.

Great.

[...snip...]

> 
> > - it might be good to create a personal branch on the gcc git
> > repository that you can push your work to.  I'm in two minds about
> > this, in that ideally you'd just commit your work to trunk once
> > each
> > patch is approved, but maybe it's good to have a public place as a
> > backup of the "under development" stuff?  Also, at some point we
> > want
> > you to be pushing changes to the trunk, so we'll want your account
> > to
> > be able to do that.
> 
> I already did that when I was fiddling around with the source code
> and track my changes seperately

Is there a URL for your branch?

> > 
> > If you're *really* eager to start, you might want to look at 
> >  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546
> > This is a case where the analyzer "sees" a call through a function
> > pointer, and, despite figuring out what the function pointer
> > actually
> > points to, entirely fails to properly handle the call, since the
> 
> > supergraph and engine.cc code is looking at the static callgraph,
> > and
> > needs work to handle such calls through function pointers.  I
> > started
> > debugging this a couple of weeks ago, and realized it has a *lot*
> > of
> > similarities to the vtable case, so thought I might leave it so you
> > can
> > have a go at it once the project starts properly.
> 
> yes, looking at exploded graph, the analyzer is not able to
> understand the call to function “noReturn()” when called via a
> function pointer ( noReturnPtr.0_1 ("”); ) at all. I would be
> looking into it and will report back as soon as I find something
> useful.

The issue is that the analyzer currently divides calls into
(a) calls where GCC's middle-end "knows" which function is called, and
thus the call site has a cgraph_node.
(b) calls where GCC's middle-end doesn't "know" which function is
called.

The analyzer handles
  (a) by building call and return edges in the supergraph, and
processing them, and
  (b) with an "unknown call" handler, which conservatively sets lots of
state to "unknown" to handle the effects of an arbitrary call, and
where the call doesn't get its own exploded_edge.

In this bug we have a variant of (b), let's call it (c): GCC's middle-
end doesn't know which function is called, but the analyzer's
region_model *does* know at a particular exploded_node.  I expect this
kind of thing will also arise for virtual function calls.  So I think
you should look at supergraph.cc at where it handles calls; I think we
need to update how it handles (b), so that it can handle the (c) cases,
probably by splitting supernodes at all call sites, rather than just
those with cgraph_edges, and then creating exploded_edges (with custom
edge info) for calls where the analyzer "figured out" what the function
pointer was in the region_model, even if there wasn't a cgraph_node.

Does that make sense?

Or you could attack the problem from the other direction, by looking at
what GCC generates for a vfunc call, and seeing if you can get the
region_model to "figure out" what the function pointer is at a
particular exploded_node.

> 
> also, should I prefer discussing about this bug here( gcc mailing
> list) or on the bugzilla itself ?

Either way works for me.  Maybe on this list?  (given that this feels
like a design question)

Hope this is helpful
Dave





Re: progress update after initial GSoC virtual meetup

2021-06-13 Thread David Malcolm via Gcc
On Sun, 2021-06-13 at 19:11 +0530, Ankur Saini wrote:
> 
> 
> > On 08-Jun-2021, at 11:24 PM, David Malcolm 
> > wrote:
> > 
> > Is there a URL for your branch?
> 
> no, currently it only local branch on my machine. Should I upload it on
> a hosting site ( like GitHub ) ? or can I create a branch on remote
> also ?

At some point we want you to be able to push patches to trunk, so as a
step towards that I think it would be good for you to have a personal
branch on the gcc git repository.

A guide to getting access is here:
  https://gcc.gnu.org/gitwrite.html

I will sponsor you.

> 
> > The issue is that the analyzer currently divides calls into
> > (a) calls where GCC's middle-end "knows" which function is called,
> > and
> > thus the call site has a cgraph_node.
> > (b) calls where GCC's middle-end doesn't "know" which function is
> > called.
> > 
> > The analyzer handles
> >  (a) by building call and return edges in the supergraph, and
> > processing them, and
> >  (b) with an "unknown call" handler, which conservatively sets lots
> > of
> > state to "unknown" to handle the effects of an arbitrary call, and
> > where the call doesn't get its own exploded_edge.
> 
> > 
> > In this bug we have a variant of (b), let's call it (c): GCC's
> > middle-
> > end doesn't know which function is called, but the analyzer's
> > region_model *does* know at a particular exploded_node.
> 
> but how will the we know this at the time of creation of supergraph?
> isn’t exploded graph and regional model created after the supergraph ?

You are correct.

What I'm thinking is that when we create the supergraph we should split
the nodes at more calls, not just at those calls that have a
cgraph_edge, but also at those that are calls to an unknown function
pointer (or maybe even split them at *all* calls).

Then, later, when engine.cc is building the exploded_graph, the
supergraph will have a superedge for those calls, and we can create an
exploded_edge representing the call.  That way if we discover the
function pointer then (rather than having it from a cgraph_edge), we
can build exploded nodes and exploded edges that are similar to the "we
had a cgraph_edge" case.  You may need to generalize some of the event-
handling code to do this.

Does that make sense?

You might want to try building some really simple examples of this, to
make it as easy as possible to see what's happening, and to debug.

> 
> >  I expect this kind of thing will also arise for virtual function
> > calls.
> 
> yes, it would be a similar case as if the call is not devirtualised,
> GCC’s middle-end would not know which function is being called but our
> regional model would know about the same.

Yes.

> 
> >  So I think you should look at supergraph.cc at where it handles
> > calls; I think we
> > need to update how it handles (b), so that it can handle the (c)
> > cases,
> > probably by splitting supernodes at all call sites, rather than
> > just
> > those with cgraph_edges, and then creating exploded_edges (with
> > custom
> > edge info) for calls where the analyzer "figured out" what the
> > function
> > pointer was in the region_model, even if there wasn't a
> > cgraph_node.
> 
> > 
> > Does that make sense?
> 
> ok so we are leaving the decision of how to handle case (b) to
> explodedgraph with the additional info from the regional model and
> create a call and return supernodes for all type of function calls
> whether or not middle-end know which function is called or not, makes
> sense. ( ok so this answers my previous question )
> 
> I went through supergraph.cc  and can see the
> splitting happening in the constructor’s (supergraph::supergraph() )
> at the end of first pass.

It sounds to me like you are on the right track.

> 
> > 
> > Or you could attack the problem from the other direction, by
> > looking at
> > what GCC generates for a vfunc call, and seeing if you can get the
> > region_model to "figure out" what the function pointer is at a
> > particular exploded_node.
> 
> I will also be looking at this after the fixing the above problem, my
> current plan is to see how GCC's devirtualiser do it.

OK.

> 
> > 
> > > 
> > > also, should I prefer discussing about this bug here( gcc mailing
> > > list) or on the bugzilla itself ?
> > 
> > Either way works for me.  Maybe on this list?  (given that this
> > feels
> > like a design question)
> 
> ok
> 
> > 
> > Hope this is helpful
> > Dave
> 
> Thanks
> 
> - Ankur

Great.

Let me know how you get on.

As I understand it, Google recommends that we're exchanging emails
about our GSoC project at least two times a week, so please do continue
to report in, whether you're making progress, or if you feel you're
stuck on something.

Hope this is constructive.
Dave




Re: progress update

2021-06-15 Thread David Malcolm via Gcc
On Tue, 2021-06-15 at 19:42 +0530, Ankur Saini wrote:
> 
> 
> > On 13-Jun-2021, at 8:22 PM, David Malcolm 
> > wrote:
> > 
> > On Sun, 2021-06-13 at 19:11 +0530, Ankur Saini wrote:
> > > 
> > > 
> > > > On 08-Jun-2021, at 11:24 PM, David Malcolm
> > > > mailto:dmalc...@redhat.com>>
> > > > wrote:
> > > > 
> > > > Is there a URL for your branch?
> > > 
> > > no, currently it only local branch on my machine. Should I upload
> > > it on
> > > a hosting site ( like GitHub ) ? or can I create a branch on remote
> > > also ?
> > 
> > At some point we want you to be able to push patches to trunk, so as
> > a
> > step towards that I think it would be good for you to have a personal
> > branch on the gcc git repository.
> > 
> > A guide to getting access is here:
> >  https://gcc.gnu.org/gitwrite.html <
> > https://gcc.gnu.org/gitwrite.html>
> > 
> > I will sponsor you.
> 
> I have filled the form.

Thanks.  I've clicked on the "Approve" button; presumably we now need
to wait for an admin to make this happen.


[...]

> > > 
> 
> > Great.
> > 
> > Let me know how you get on.
> > 
> > As I understand it, Google recommends that we're exchanging emails
> > about our GSoC project at least two times a week, so please do
> > continue
> > to report in, whether you're making progress, or if you feel you're
> > stuck on something.
> 
> ok I would be more active from now on.

Yes please.

> 
> —
> 
> btw while using the gdb on “xgcc”, for some reason, debugger is not
> tracing the call to "run_checkers()” and is directly jumping from
> "pass_analyzer::execute()” to some instruction inside
> "ana::dump_analyzer_json()”. 
> 
> I am invoking debugger like this  :- 
> 
> —
> $ ./xgcc /Users/ankursaini/Desktop/test.c -fanalyzer -B . -wrapper
> gdb,—args
> —
> 
> and then while putting a breakpoint on “ana::run_checkers()”, gdb
> places 2 breakpoints ( one on correct position and another weirdly
> inside a different function in the same file )
> 
> —
> (gdb) br ana::run_checkers() 
> Breakpoint 3 at 0x101640990 (2 locations)
> 
> (gdb) info br
> Num Type   Disp Enb Address    What
> 1   breakpoint keep y   0x00010174ade7 in
> fancy_abort(char const*, int, char const*) at ../../gcc-
> source/gcc/diagnostic.c:1915
> 2   breakpoint keep y   0x00010174ee01 in
> internal_error(char const*, ...) at ../../gcc-
> source/gcc/diagnostic.c:1835
> 
> 3   breakpoint keep y    
> 3.1  y   0x000101640990
>  const&)+48>
> 3.2  y   0x000101640ba0 in
> ana::run_checkers() at ../../gcc-source/gcc/analyzer/engine.cc:4918
> 
> —
> 
> but during the execution it only hits the breakpoint 3.1 ( which is
> inside the function "ana::dump_analyzer_json()” which according to me
> is called during the execution of "impl_run_checkers()”, after
> completing the analysis to dump the results in json format ) 
> 
> after looking at backtrace, I could see it calling
> "pass_analyzer::execute()” where “run_checkers()” should be called,
> but no such call (or a call to "impl_run_checkers()”)  is seen there
> .
> 
> here is the backtrace when debugger hits this breakpoint 3.1
> —
> (gdb) c
> Continuing.
> [New Thread 0x1c17 of process 2392]
> 
> Thread 2 hit Breakpoint 3, 0x000101640990 in
> ana::dump_analyzer_json (sg=..., eg=...) at ../../gcc-
> source/gcc/analyzer/engine.cc:4751
> 4751  char *filename = concat (dump_base_name,
> ".analyzer.json.gz", NULL);
> 
> (gdb) bt
> #0  0x000101640990 in ana::dump_analyzer_json (sg=..., eg=...) at
> ../../gcc-source/gcc/analyzer/engine.cc:4751
> #1  0x00010161a919 in (anonymous
> namespace)::pass_analyzer::execute (this=0x142b0a660) at ../../gcc-
> source/gcc/analyzer/analyzer-pass.cc:87
> #2  0x00010106319c in execute_one_pass (pass=,   
> pass@entry=) at ../../gcc-source/gcc/passes.c:2567
> #3  0x000101064e1c in execute_ipa_pass_list (pass= 0x0>) at ../../gcc-source/gcc/passes.c:2996
> #4  0x000100a89065 in symbol_table::output_weakrefs
> (this=) at ../../gcc-source/gcc/cgraphunit.c:2262
> #5  0x000102038600 in ?? ()
> #6  0x in ?? ()
> —
> 
> but at the end I can see the analyzer doing it’s work and generating
> the required warning as intended.
> 
> I never used to experience this problem earlier when using debugger
> on a full bootstrapped build. Looks like I am missing something here.

You're passing -wrapper gdb,--args when compiling a .c file, so gdb is
debugging the "cc1" invocation, of the cc1 in the "gcc" subdirectory.

gdb is using the debuginfo embedded in the cc1 binary to locate
functions, decode expressions, etc.   (the debuginfo is probably in
DWARF format, embedded in a section of an ELF-formatted binary file).

If you're doing a full bootstrap build, then the gcc subdirectory and
its cc1 are the "3rd stage".  This 3rd stage was built using the 2nd
stage's xgcc/cc1plus, and thus the debuginfo inside the cc1 

Re: Progress update on extending static analyser to support c++'s virtual function

2021-06-22 Thread David Malcolm via Gcc
On Mon, 2021-06-21 at 14:22 +0530, Ankur Saini wrote:
> so I have a good news and a bad news 
> 
> good news is that I was successfully able to split the calls at every
> call-site during the creation of super-graph. 
> 
> I did it by simply adding an 'else’ statement where analyser handles
> splitting of snodes, so that it can still handle the known calls (
> one with a cgraph_edge ) and also split the calls at the unknown call
> sites for analyzer to later speculate the source of the call with
> more information from regional models. 
> 
> something like this :-
> 
> in `ana::supergraph::supergraph(ana::logger*)` in supergraph.cc <
> http://supergraph.cc/>
> 
> 185 if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
> 186 {
> 187    m_cgraph_edge_to_caller_prev_node.put(edge,
> node_for_stmts);
> 188    node_for_stmts = add_node (fun, bb, as_a 
> (stmt), NULL);
> 189    m_cgraph_edge_to_caller_next_node.put (edge,
> node_for_stmts);
> 190 }
> 191 else
> 192 {
> 193   gcall *call = dyn_cast (stmt);
> 194   if (call)
> 195 node_for_stmts = add_node (fun, bb, as_a 
> (stmt), NULL);
> 196 }
> 
> after building I could see analyzer creating snodes for returning
> calls from the function it was not before for various examples. 
> 

Great.   Have you made any progress on creating eedges/enodes for such
calls?

> —
> 
> now the bad news. 
> 
> I accidentally overwrote the file containing my ssh key to
> gcc.gnu.org  , with another ssh key. :(
> 
> is there something that I can do to retrieve it back ? or is it lost
> forever and I have no option left other than contacting 
> overse...@gcc.gnu.org  regarding the
> same ?

If you've overwritten the private key, then there's no way to get it
back (if I understand things correctly).  You should create a new one
and contact overseers.

Dave



Re: daily report on extending static analyzer project [GSoC]

2021-06-24 Thread David Malcolm via Gcc
On Thu, 2021-06-24 at 19:59 +0530, Ankur Saini wrote:
> CURRENT STATUS :
> 
> analyzer is now splitting nodes even at call sites which doesn’t have
> a cgraph_edge. But as now the call and return nodes are not
> connected, the part of the function after such calls becomes
> unreachable making them impossible to properly analyse.
> 
> AIM for today : 
> 
> - try to create an intra-procedural link between the calls the
> calling and returning snodes 
> - find the place where the exploded nodes and edges are being formed 
> - figure out the program point where exploded graph would know about
> the function calls
> 
> —
> 
> PROGRESS :
> 
> - I initially tried to connect the calling and returning snodes with
> an intraprocedural sedge but looks like for that only nodes which
> have a cgraph_edge or a CFG edge are connected in the supergraph. I
> tried a few ways to connect them but at the end thought I would be
> better off leaving them like this and connecting them during the
> creation of exploded graph itself.
> 
> - As the exploded graph is created during building and processing of
> the worklist, "build_initial_worklist ()” and “process_worklist()”
> should be the interesting areas to analyse, especially the processing
> part.
> 
> - “build_initial_worklist()” is just creating enodes for functions
> that can be called explicitly ( possible entry points ) so I guess
> the better place to investigate is “process_worklist ()” function.

Yes.

Have a look at exploded_graph::process_node (which is called by
process_worklist).
The eedges for calls with supergraph edges happens there in
the "case PK_AFTER_SUPERNODE:", which looks at the outgoing superedges
from that supernode and calls node->on_edge on them, creating a
exploded nodes/exploded edge for each outgoing-superedge.

So you'll need to make some changes there, I think.

> 
> —
> 
> STATUS AT THE END OF THE DAY :- 
> 
> - try to create an intra-procedural link between the calls the
> calling and returning snodes ( Abandoned )

You may find the above useful if you're going to do it based on the
code I mentioned above.

> - find the place where the exploded nodes and edges are being formed
> ( Done )
> - figure out the program point where exploded graph knows about the
> function call ( Pending )
> 

Thanks for the update.
Hope the above is helpful.

Dave



Re: daily report on extending static analyzer project [GSoC]

2021-06-25 Thread David Malcolm via Gcc
On Fri, 2021-06-25 at 20:33 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - try to create an intra-procedural link between the calls the calling
> and returning snodes
> - figure out the program point where exploded graph would know about
> the function calls
> - figure out how the exploded node will know which function to call
> - create enodes and eedges for the calls
> 
> —
> 
> PROGRESS :
> 
> - I created an intraprocedural link between where the the splitting is 
> happening to connect the call and returning snodes. like this :-
> 
> (in supergraph.cc at "supergraph::supergraph (logger *logger)" )
> ```
> 185 if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
> 186 {
> 187    m_cgraph_edge_to_caller_prev_node.put(edge, 
> node_for_stmts);
> 188    node_for_stmts = add_node (fun, bb, as_a  (stmt), 
> NULL);
> 189    m_cgraph_edge_to_caller_next_node.put (edge, 
> node_for_stmts);
> 190 }
> 191 else
> 192 {
> 193   gcall *call = dyn_cast (stmt);
> 194   if (call)
> 195   {
> 196 supernode *old_node_for_stmts = node_for_stmts;
> 197 node_for_stmts = add_node (fun, bb, as_a  
> (stmt), NULL);
  ^
Given the dyn_cast of stmt to gcall * at line 193 you can use "call"
here, without the as_a cast, as you've already got "stmt" as a gcall *
as tline 193.

You might need to add a hash_map recording the mapping from such stmts
to the edges, like line 189 does.  I'm not sure, but you may need it
later.


> 198
> 199 superedge *sedge = new callgraph_superedge 
> (old_node_for_stmts,
> 200 node_for_stmts,
> 201 SUPEREDGE_INTRAPROCEDURAL_CALL,
> 202 NULL);
> 203 add_edge (sedge);
> 204   }    
> 205 }
> ```
> 
> - now that we have a intraprocedural link between such calls, and the
> analyzer will consider them as “impossible edge” ( whenever a "node-
> >on_edge()” returns false ) while processing worklist, and I think this
> should be the correct place to speculate about the function call by
> creating exploded nodes and edges representing calls ( maybe by adding
> a custom edge info ).
> 
> - after several of failed attempts to do as mentioned above, looks like
> I was looking the wrong way all along. I think I just found out what my
> mentor meant when telling me to look into "calls node->on_edge”. During
> the edge inspection ( in program_point::on_edge() ) , if it’s an
> Intraprocedural s sedge, maybe I can add an extra intraprocedural sedge
> to the correct edge right here with the info state of that program
> point. 

I don't think we need a superedge for such a call, just an
exploded_edge.  (Though perhaps adding a superedge might make things
easier?  I'm not sure, but I'd first try not bothering to add one)

> 
> Q. But even if we find out which function to call, how will the
> analyzer know which snode does that function belong ?

Use this method of supergraph:
  supernode *get_node_for_function_entry (function *fun) const;
to get the supernode for the entrypoint of a given function.

You can get the function * from a fndecl via DECL_STRUCT_FUNCTION.

> Q. on line 461 of program-point.cc 
> 
> ```
> 457 else
> 458   {
> 459 /* Otherwise, we ignore these edges  */
> 460 if (logger)
> 461   logger->log ("rejecting interprocedural edge");
> 462 return false;
> 463   }
> ```
> why are we rejecting “interprocedural" edge when we are examining an
> “intraprocedural” edge ? or is it for the "cg_sedge->m_cedge” edge,
> which is an interprocedural edge ?

Currently, those interprocedural edges don't do much.  Above the "else"
clause of the lines above the ones you quote is some support for call
summaries.

The idea is that we ought to be able to compute summaries of what a
function call does, and avoid exponential explosions during the
analysis by reusing summaries at a callsite.  But that code doesn't
work well at the moment; see:
  https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=99390

If you ignore call summaries for now, I think you need to change this
logic so it detects if we have a function pointer that we "know" the
value of from the region_model, and have it generate an exploded_node
and exploded_edge for the call.  Have a look at how SUPEREDGE_CALL is
handled by program_state and program_point; you should implement
something similar, I think.  Given that you need both the super_edge,
point *and* state all together to detect this case, I think the logic
you need to add probably needs to be in exploded_node::on_edge as a
specialcase before the call there to next_point->on_edge.

Hope this is helpful
Dave


> 
> STATUS AT THE END OF T

Re: daily report on extending static analyzer project [GSoC]

2021-06-27 Thread David Malcolm via Gcc
On Sat, 2021-06-26 at 20:50 +0530, Ankur Saini wrote:
> 
> > On 25-Jun-2021, at 9:04 PM, David Malcolm 
> > wrote:
> > 
> > On Fri, 2021-06-25 at 20:33 +0530, Ankur Saini wrote:
> > > AIM for today : 
> > > 
> > > - try to create an intra-procedural link between the calls the
> > > calling
> > > and returning snodes
> > > - figure out the program point where exploded graph would know
> > > about
> > > the function calls
> > > - figure out how the exploded node will know which function to
> > > call
> > > - create enodes and eedges for the calls
> > > 
> > > —
> > > 
> > > PROGRESS :
> > > 
> > > - I created an intraprocedural link between where the the
> > > splitting is happening to connect the call and returning snodes.
> > > like this :-
> > > 
> > > (in supergraph.cc at "supergraph::supergraph (logger *logger)" )
> > > ```
> > > 185 if (cgraph_edge *edge = supergraph_call_edge
> > > (fun, stmt))
> > > 186 {
> > > 187    m_cgraph_edge_to_caller_prev_node.put(edge,
> > > node_for_stmts);
> > > 188    node_for_stmts = add_node (fun, bb, as_a
> > >  (stmt), NULL);
> > > 189    m_cgraph_edge_to_caller_next_node.put (edge,
> > > node_for_stmts);
> > > 190 }
> > > 191 else
> > > 192 {
> > > 193   gcall *call = dyn_cast (stmt);
> > > 194   if (call)
> > > 195   {
> > > 196 supernode *old_node_for_stmts =
> > > node_for_stmts;
> > > 197 node_for_stmts = add_node (fun, bb, as_a
> > >  (stmt), NULL);
> > 
> > ^
> > Given the dyn_cast of stmt to gcall * at line 193 you can use
> > "call"
> > here, without the as_a cast, as you've already got "stmt" as a
> > gcall *
> > as tline 193.
> 
> ok
> 
> > 
> > You might need to add a hash_map recording the mapping from such
> > stmts
> > to the edges, like line 189 does.  I'm not sure, but you may need
> > it
> > later.
> 
> but the node is being created if there is no cgraph_edge
> corresponding to the call, so to what edge will I map
> “node_for_stmts" to ?

Sorry; I think I got confused.  Re-reading this part of my email, it
doesn't make sense to me.  Sorry.

[...snip...]

> 
> 
> > 
> > > 
> > > Q. But even if we find out which function to call, how will the
> > > analyzer know which snode does that function belong ?
> > 
> > Use this method of supergraph:
> >  supernode *get_node_for_function_entry (function *fun) const;
> > to get the supernode for the entrypoint of a given function.
> > 
> > You can get the function * from a fndecl via DECL_STRUCT_FUNCTION.
> 
> so once we get fndecl, it should be comparatively smooth sailing from
> there. 
> 
> My attempt to get the value of function pointer from the state : -
> 
> - to access the region model of the state, I tried to access
> “m_region_model” of that state.
> - now I want to access cluster for a function pointer.
> - but when looking at the accessible functions to region model class,
> I couldn’t seem to find the fitting one. ( the closest I could find
> was “region_model::get_reachable_svalues()” to get a set of all the
> svalues reachable from that model )

In general you can use:
  region_model::get_rvalue
to go from a tree to a symbolic value for what the analyzer "thinks"
the value of that tree is at that point along the path.

If it "knows" that it's a specific function pointer, then IIRC this
will return a region_svalue where region_svalue::get_pointee () will
(hopefully) point at the function_region representing the memory
holding the code of the function.  function_region::get_fndecl should
then give you the tree for the specific FUNCTION_DECL, from which you
can find the supergraph node etc.

It looks like
  region_model::get_fndecl_for_call
might already do most of what you need, but it looks like it bails out
for the "NULL cgraph_node" case.  Maybe that needs fixing, so that it
returns the fndecl for that case?  That already gets used in some
places, so maybe try putting a breakpoint on that and see if fixing
that gets you further?

Hope this is helpful
Dave



Re: daily report on extending static analyzer project [GSoC]

2021-06-28 Thread David Malcolm via Gcc
On Mon, 2021-06-28 at 20:23 +0530, Ankur Saini wrote:
> 
> 
> > On 28-Jun-2021, at 12:18 AM, David Malcolm 
> > wrote:
> > > 
> > > > 
> > > > > 
> > > > > Q. But even if we find out which function to call, how will
> > > > > the
> > > > > analyzer know which snode does that function belong ?
> > > > 
> > > > Use this method of supergraph:
> > > >  supernode *get_node_for_function_entry (function *fun) const;
> > > > to get the supernode for the entrypoint of a given function.
> > > > 
> > > > You can get the function * from a fndecl via
> > > > DECL_STRUCT_FUNCTION.
> > > 
> > > so once we get fndecl, it should be comparatively smooth sailing
> > > from
> > > there. 
> > > 
> > > My attempt to get the value of function pointer from the state :
> > > -
> > > 
> > > - to access the region model of the state, I tried to access
> > > “m_region_model” of that state.
> > > - now I want to access cluster for a function pointer.
> > > - but when looking at the accessible functions to region model
> > > class,
> > > I couldn’t seem to find the fitting one. ( the closest I could
> > > find
> > > was “region_model::get_reachable_svalues()” to get a set of all
> > > the
> > > svalues reachable from that model )
> > 
> > In general you can use:
> >  region_model::get_rvalue
> > to go from a tree to a symbolic value for what the analyzer
> > "thinks"
> > the value of that tree is at that point along the path.
> > 
> > If it "knows" that it's a specific function pointer, then IIRC this
> > will return a region_svalue where region_svalue::get_pointee ()
> > will
> > (hopefully) point at the function_region representing the memory
> > holding the code of the function.  function_region::get_fndecl
> > should
> > then give you the tree for the specific FUNCTION_DECL, from which
> > you
> > can find the supergraph node etc.
> > 
> > It looks like
> >  region_model::get_fndecl_for_call
> > might already do most of what you need, but it looks like it bails
> > out
> > for the "NULL cgraph_node" case.  Maybe that needs fixing, so that
> > it
> > returns the fndecl for that case?  That already gets used in some
> > places, so maybe try putting a breakpoint on that and see if fixing
> > that gets you further?
> 
> shouldn’t the fn_decl should still have a cgraph_node if the function
> is declared in the program itself ? it should just not have an edge
> representing the call.

That would make sense.  I'd suggest verifying that in the debugger.

> Because I was able to find the super-graph node just with the help of
> the function itself.

Great.


> 
> this is how the function looks "exploded_node::on_edge()" right now.
> 
> File: {$SCR_DIR}/gcc/analyzer/engine.cc
> 1305: bool
> 1306: exploded_node::on_edge (exploded_graph &eg,
> 1307:   const superedge *succ,
> 1308:   program_point *next_point,
> 1309:   program_state *next_state,
> 1310:   uncertainty_t *uncertainty)
> 1311: {
> 1312:   LOG_FUNC (eg.get_logger ());
> 1313: 
> 1314:   if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL)
> 1315:   {    
> 1316: const program_point *this_point = &this->get_point();
> 1317: const program_state *this_state = &this->get_state ();
> 1318: const gcall *call = this_point->get_supernode ()-
> >get_final_call ();    
> 1319: 
> 1320: impl_region_model_context ctxt (eg, 
> 1321:   this, 
> 1322:   this_state, 
> 1323:   next_state, 
> 1324:   uncertainty,
> 1325:   this_point->get_stmt());
> 1326: 
> 1327: region_model *model = this_state->m_region_model;
> 1328: tree fn_decl = model->get_fndecl_for_call(call,&ctxt);
> 1329: if(DECL_STRUCT_FUNCTION(fn_decl))
> 1330: {
> 1331:   const supergraph *sg = &eg.get_supergraph();
> 1332:   supernode * sn =  sg->get_node_for_function_entry
> (DECL_STRUCT_FUNCTION(fn_decl));
> 1333:   // create enode and eedge ?
> 1334: }
> 1335:   }
> 1336: 
> 1337:   if (!next_point->on_edge (eg, succ))
> 1338: return false;
> 1339: 
> 1340:   if (!next_state->on_edge (eg, this, succ, uncertainty))
> 1341: return false;
> 1342: 
> 1343:   return true;
> 1344: }

Looks promising.

> 
> for now, it is also detecting calls that already have call_sedge
> connecting them, so I think I also have to filter them out.

Right, I think so too.

Dave



Re: daily report on extending static analyzer project [GSoC]

2021-06-29 Thread David Malcolm via Gcc
On Tue, 2021-06-29 at 22:04 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - filter out the the nodes which already have an supergraph edge
> representing the call to avoid creating another edge for call
> - create enode for destination
> - create eedge representing the call itself
> 
> —
> 
> PROGRESS :
> 
> - in order to filter out only the relevant edges, I simply used the
> fact that the edge that we care about will not have any call_graph
> edge associated with it. ( means “sedge->get_any_callgraph_edge()"
> would return NULL )
> 
> - I was also successfully able to create the enode and connect it
> with an eedge representing the call and was able to see it calling
> the correct function on some examples. :)
> 
> - But the problem now is returning from the function, which turned
> out bigger then I though it was. 
> 
> - In order to tackle this problem, I first tried to update the
> call_string with the call, but the only way to push a call to the
> string I found was via “call_string::push_call()” function which
> finds the return_superedge from the cgraph_edge representing the
> return call ( which we don’t have )
> 
> so I decided to make an overload of "call_string::push_call()” which
> directly takes a return_superedge and push it the underlying vector
> of edges instead of taking it from the calling edge. It looks
> something like this :-
> 
> File:  {$SCR_DIR}/gcc/analyzer/call-string.cc
> 
> 158: void
> 159: call_string::push_call(const return_superedge *return_sedge)
> 160: {
> 161:   gcc_assert (return_sedge);
> 162:   m_return_edges.safe_push (return_sedge);
> 163: }

Looks reasonable.

> 
> I also created a temporary return_superedge ( as we now have the
> source and destination ), and try to update the call_string with it
> just to find out that call_string is private to program_point. 

I confess I'm having a little difficulty visualizing what the superedge
looks like with this new edge.


FWIW you can use the accessor:
  program_point::get_call_string ()
to get it in const form:
  const call_string &get_call_string () const { return m_call_string; }

but it sounds like you're trying to change things.



The purpose of class call_string is to track the stack of call sites,
so that when we return from a function, we return to the correct call
site.

I wonder if class call_string could be updated so that rather than
capturing a vec of superedges:
  auto_vec m_return_edges;
it captures a vec of gcall *?

Then you wouldn't need a superedge ahead of time for the return from
the call.

I'm not sure if that would work, but that might be another approach you
could try, and might be simplest.  I'm not sure.

I *think* we only use the return_superedge within
program_point::on_edge, comparing against the successor edge, but that
code could be rewritten to look at which gcall * is associated with the
edge.

(again, I'm not sure, but maybe it's simpler)


> So my plan for next day would be to create a custom function to the
> program_point class the update the call stack and return back to
> correct spot. 
> 
> If there is a better way of doing it then do let me know.
> 
> STATUS AT THE END OF THE DAY :- 
> 
> - filter out the the nodes which already have an supergraph edge
> representing the call ( Done )
> - create enode for destination ( Done )
> - create eedge representing the call itself ( Done ? )
> 
> —
> 
> P.S. it has been over a week since I sent a mail to
> overse...@gcc.gnu.org  regarding the
> ssh key incident and I haven’t got any response form them till now,
> does this usually take this long for them to respond ? or does this
> means I didn’t provide some information to them that I should have.
> Is there something else I can do regarding this problem ?

I'd try resending the email.

Hope this is helpful
Dave



Re: daily report on extending static analyzer project [GSoC]

2021-06-30 Thread David Malcolm via Gcc
On Wed, 2021-06-30 at 21:39 +0530, Ankur Saini wrote:
> 
> 
> > On 30-Jun-2021, at 1:23 AM, David Malcolm 
> > wrote:
> > 
> > On Tue, 2021-06-29 at 22:04 +0530, Ankur Saini wrote:
> > 

[...]

> > > P.S. it has been over a week since I sent a mail to    
> > > overse...@gcc.gnu.org   > > overse...@gcc.gnu.org > regarding
> > > the
> > > ssh key incident and I haven’t got any response form them till
> > > now,
> > > does this usually take this long for them to respond ? or does
> > > this
> > > means I didn’t provide some information to them that I should
> > > have.
> > > Is there something else I can do regarding this problem ?
> > 
> > I'd try resending the email.
> 
> ok I would be resending the mail again.
> Also should I cc that mail to you also ( similar to how they expect
> us to cc the sponsor at the time of creation of a new account ) ?

Yes please; that's a good idea

Dave
> 



Re: daily report on extending static analyzer project [GSoC]

2021-07-06 Thread David Malcolm via Gcc
On Sat, 2021-07-03 at 20:07 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - update the call_stack to track something else other than supergraph
> edges
> 
> —
> 
> PROGRESS :
> 
> - After some brainstorming about tracking the callstack, I think one
> better way to track the call stack is to keep track of source and
> destination of the call instead of supperedges representing them. 
> 
> - so I branched again and updated the call-string to now capture a pair
> of supernodes ( one representing callee and other representing caller
> ), like this I was not only able to easily port the entire code to
> adapt it without much difficulty, but now can also push calls to
> functions that doesn’t possess a superedge.
> 
> - changes done can be seen on the "
> refs/users/arsenic/heads/call_string_update “ branch. ( currently this
> branch doesn’t contain other changes I have done till now, as I wanted
> to test the new call-string representation exclusively and make sure it
> doesn’t affect the functionality of the base analyser )

I'm not an expert at git, so it took me a while to figure out how to
access your branch.

It's easier for me if you can also use "git format-patch" to generate a
patch and "git send-email" to send it to this list (and me, please), so
that the patch content is going to the list.

The approach in the patch seems reasonable.

I think you may have a memory leak, though: you're changing call_string
from:
  auto_vec m_return_edges;
to:
  auto_vec*> m_supernodes;

and the std:pairs are being dynamically allocated on the heap.
Ownership gets transferred by call_string::operator=, but if I'm
reading the patch correctly never get deleted.  This is OK for
prototyping, but will need fixing before the code can be merged.

It's probably simplest to get rid of the indirection and allocation in
m_supernodes and have the std::pair be stored by value, rather than by
pointer, i.e.:
  auto_vec > m_supernodes;

Does that work? (or is there a problem I'm not thinking of).

If that's a problem, I think you might be able to get away with
dropping the "first" from the pair, and simply storing the supernode to
return to; I think the only places that "first" gets used are in dumps
and in validation.  But "first" is probably helpful for debugging, so
given that you've got it working with that field, better to keep it.

Hope this is helpful
Dave

> 
> - now hopefully all that is left for tomorrow is to update the analyzer
> to finally see, call and return from the function aclled via the
> function pointer. 
> 
> STATUS AT THE END OF THE DAY :- 
> 
> - update the call_stack to track something else other than supergraph
> edges ( done )
> 
> Thank you
> - Ankur
> 




Re: daily report on extending static analyzer project [GSoC]

2021-07-06 Thread David Malcolm via Gcc
On Tue, 2021-07-06 at 18:46 -0400, David Malcolm wrote:
> On Sat, 2021-07-03 at 20:07 +0530, Ankur Saini wrote:
> > AIM for today : 
> > 
> > - update the call_stack to track something else other than
> > supergraph
> > edges
> > 
> > —
> > 
> > PROGRESS :
> > 
> > - After some brainstorming about tracking the callstack, I think
> > one
> > better way to track the call stack is to keep track of source and
> > destination of the call instead of supperedges representing them. 
> > 
> > - so I branched again and updated the call-string to now capture a
> > pair
> > of supernodes ( one representing callee and other representing
> > caller
> > ), like this I was not only able to easily port the entire code to
> > adapt it without much difficulty, but now can also push calls to
> > functions that doesn’t possess a superedge.
> > 
> > - changes done can be seen on the "
> > refs/users/arsenic/heads/call_string_update “ branch. ( currently
> > this
> > branch doesn’t contain other changes I have done till now, as I
> > wanted
> > to test the new call-string representation exclusively and make
> > sure it
> > doesn’t affect the functionality of the base analyser )
> 
> I'm not an expert at git, so it took me a while to figure out how to
> access your branch.
> 
> It's easier for me if you can also use "git format-patch" to generate
> a
> patch and "git send-email" to send it to this list (and me, please),
> so
> that the patch content is going to the list.
> 
> The approach in the patch seems reasonable.
> 
> I think you may have a memory leak, though: you're changing
> call_string
> from:
>   auto_vec m_return_edges;
> to:
>   auto_vec*>
> m_supernodes;

One other, unrelated idea that just occurred to me: those lines get
very long, so maybe introduce a typedef e.g. 
  typedef std::pair element_t;
so that you can refer to the pairs as call_string::element_t, and just
element_t when you're in call_string scope, and just have a:

  auto_vec m_supernodes;

or

  auto_vec m_elements; 

within the call_string, if that makes sense.  Does that simplify
things?

Dave



Re: daily report on extending static analyzer project [GSoC]

2021-07-06 Thread David Malcolm via Gcc
On Mon, 2021-07-05 at 21:45 +0530, Ankur Saini wrote:
> I forgot to send the daily report yesterday, so this one covers the
> work done on both days
> 
> AIM : 
> 
> - make the analyzer call the function with the updated call-string
> representation ( even the ones that doesn’t have a superedge )
> - make the analyzer figure out the point of return from the function
> called without the superedge
> - make the analyser figure out the correct point to return back in the
> caller function
> - make enode and eedge representing the return call
> - test the changes on the example I created before
> - speculate what GCC generates for a vfunc call and discuss how can we
> use it to our advantage
> 
> —
> 
> PROGRESS  ( changes can be seen on
> "refs/users/arsenic/heads/analyzer_extension “ branch of the repository
> ) :
> 
> - Thanks to the new call-string representation, I was able to push
> calls to the call stack which doesn’t have a superedge and was
> successfully able to see the calls happening via the function pointer.
> 
> - To detect the returning point of the function I used the fact that
> such supernodes would contain an EXIT bb, would not have any return
> superedge and would still have a pending call-stack. 
> 
> - Now the next part was to find out the destination node of the return,
> for this I again made use of the new call string and created a custom
> accessor to get the caller and callee supernodes of the return call,
> then I extracted the gcall* from the caller supernode to ulpdate the
> program state, 
> 
> - now that I have got next state and next point, it was time to put the
> final piece of puzzle together and create exploded node and edge
> representing the returning call.
> 
> - I tested the changes on the the following program where the analyzer
> was earlier giving a false negative due to not detecting call via a
> function pointer
> 
> ```
> #include 
> #include 
> 
> void fun(int *int_ptr)
> {
>     free(int_ptr);
> }
> 
> int test()
> {
>     int *int_ptr = (int*)malloc(sizeof(int));
>     void (*fun_ptr)(int *) = &fun;
>     (*fun_ptr)(int_ptr);
> 
>     return 0;
> }
> 
> void test_2()
> {
>   test();
> }
> ```
> ( compiler explorer link : https://godbolt.org/z/9KfenGET9 <
> https://godbolt.org/z/9KfenGET9> )
> 
> and results were showing success where the analyzer was now able to
> successfully detect, call and return from the function that was called
> via the function pointer and no longer reported the memory leak it was
> reporting before. : )

This is great; well done!

It would be good to turn the above into a regression test.  I think you
can do that by simply adding it to gcc/testsuite/gcc.dg/analyzer.  You
could also add a case where fun_ptr is called twice, and check that it
reports it as a double-free (and add a dg-warning directive to verify
that it correctly complains).

I wonder if your branch has already have fixed:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546

> 
> - I think I should point this out, in the process I created a lot of
> custom function to access/alter some data which was not possible
> before.
> 
> - now that calls via function pointer are taken care of, it was time
> to see what exactly happen what GCC generates when a function is
> dispatched dynamically, and as planned earlier, I went to  ipa-
> devirt.c ( devirtualizer’s implementation of GCC ) to investigate.
> 
> - althogh I didn’t understood everything that was happening there but
> here are some of the findings I though might be interesting for the
> project :- 
> > the polymorphic call is called with a OBJ_TYPE_REF which
> contains otr_type( a type of class whose method is called) and
> otr_token (the index into virtual table where address is taken)
> > the devirtualizer builds a type inheritance graph to keep
> track of entire inheritance hierarchy
> > the most interesting function I found was
> “possible_polymorphic_call_targets()” which returns the vector of all
> possible targets of polymorphic call represented by a calledge or a
> gcall.
> > what I understood the devirtualizer do is to search in
> these polymorphic calls and filter out the the calls which are more
> likely to be called ( known as likely calls ) and then turn them into
> speculative calls which are later turned into direct calls.
> 
> - another thing I was curious to know was, how would analyzer behave
> when encountered with a polymorphic call now that we are splitting
> the superedges at every call. 
> 
> the results were interesting, I was able to see analyzer splitting
> supernodes for the calls right away but this time they were not
> connected via a intraprocedural edge making the analyzer crashing at
> the callsite ( I would look more into it tomorrow ) 
> 
> the example I used was : -
> ```
> struct A
> {
>     virtual int foo (void) 
>     {
>     return 42;
>     }
> };
> 
> struct B: public A
> {
>   int foo (void) 
>     { 
> return 0;
>     }
> };
> 
> i

Re: daily report on extending static analyzer project [GSoC]

2021-07-07 Thread David Malcolm via Gcc
On Wed, 2021-07-07 at 19:22 +0530, Ankur Saini wrote:
> 
> 
> > On 07-Jul-2021, at 4:16 AM, David Malcolm 
> > wrote:
> > 
> > On Sat, 2021-07-03 at 20:07 +0530, Ankur Saini wrote:
> > > AIM for today : 
> > > 
> > > - update the call_stack to track something else other than
> > > supergraph
> > > edges
> > > 
> > > —
> > > 
> > > PROGRESS :
> > > 
> > > - After some brainstorming about tracking the callstack, I think
> > > one
> > > better way to track the call stack is to keep track of source and
> > > destination of the call instead of supperedges representing them.
> > > 
> > > - so I branched again and updated the call-string to now capture
> > > a pair
> > > of supernodes ( one representing callee and other representing
> > > caller
> > > ), like this I was not only able to easily port the entire code
> > > to
> > > adapt it without much difficulty, but now can also push calls to
> > > functions that doesn’t possess a superedge.
> > > 
> > > - changes done can be seen on the "
> > > refs/users/arsenic/heads/call_string_update “ branch. ( currently
> > > this
> > > branch doesn’t contain other changes I have done till now, as I
> > > wanted
> > > to test the new call-string representation exclusively and make
> > > sure it
> > > doesn’t affect the functionality of the base analyser )
> > 
> > I'm not an expert at git, so it took me a while to figure out how
> > to
> > access your branch.
> > 
> > It's easier for me if you can also use "git format-patch" to
> > generate a
> > patch and "git send-email" to send it to this list (and me,
> > please), so
> > that the patch content is going to the list.
> > 
> > The approach in the patch seems reasonable.
> > 
> > I think you may have a memory leak, though: you're changing
> > call_string
> > from:
> >  auto_vec m_return_edges;
> > to:
> >  auto_vec*>
> > m_supernodes;
> > 
> > and the std:pairs are being dynamically allocated on the heap.
> > Ownership gets transferred by call_string::operator=, but if I'm
> > reading the patch correctly never get deleted.  This is OK for
> > prototyping, but will need fixing before the code can be merged.
> 
> > 
> > It's probably simplest to get rid of the indirection and allocation
> > in
> > m_supernodes and have the std::pair be stored by value, rather than
> > by
> > pointer, i.e.:
> >  auto_vec >
> > m_supernodes;
> > 
> > Does that work? (or is there a problem I'm not thinking of).
> 
> yes, I noticed that while creating, was thinking to empty the vector
> and delete the all the memory in the destructor of the call-string (
> or make them unique pointers and let them destroy themselves ) but
> looks like storing the values of the pairs would make more sense.

Yes, just storing the std::pair rather than new/delete is much simpler.

There's also an auto_delete_vec which stores (T *) as the elements
and deletes all of the elements in its dtor, but the assignment
operator/copy-ctor/move-assign/move-ctor probably don't work properly,
and the overhead of new/delete probably isn't needed.

> 
> > 
> > If that's a problem, I think you might be able to get away with
> > dropping the "first" from the pair, and simply storing the
> > supernode to
> > return to; I think the only places that "first" gets used are in
> > dumps
> > and in validation.  But "first" is probably helpful for debugging,
> > so
> > given that you've got it working with that field, better to keep
> > it.
> 
> yes, I see that too, but my idea is to keep it as is for now ( maybe
> it might turn out to be helpful in future ). I will change it back if
> it proves to be useless and we get time at the end.

Yes; my suggestion was just in case there were issues with fixing the
auto_vec.   It's better for debugging to have both of the pointers in
the element.

[...snip...]

> > > 
> > > I think you may have a memory leak, though: you're changing
> > > call_string
> > > from:
> > >   auto_vec m_return_edges;
> > > to:
> > >   auto_vec*>
> > > m_supernodes;
> > 
> > One other, unrelated idea that just occurred to me: those lines get
> > very long, so maybe introduce a typedef e.g. 
> >  typedef std::pair element_t;
> > so that you can refer to the pairs as call_string::element_t, and
> > just
> > element_t when you're in call_string scope, and just have a:
> > 
> >  auto_vec m_supernodes;
> > 
> > or
> > 
> >  auto_vec m_elements; 
> > 
> > within the call_string, if that makes sense.  Does that simplify
> > things?
> 
> Yes, this is a nice idea, I will update the call-stack with next
> update to the analyzer, or should I update it and send a patch to the
> mailing list with this call_string changes for review first and then
> work on the other changes ?

I prefer reviewing code via emails to the mailing list, rather than
looking at it in the repository.  One benefit is that other list
subscribers (and archive readers) can easily see the code we're
discussing; this will become more significant as we go into the ipa-
devirt code which wasn't written by me.

That sa

Re: where is PRnnnn required again?

2021-07-07 Thread David Malcolm via Gcc
On Wed, 2021-07-07 at 16:58 -0600, Martin Sebor via Gcc wrote:
> On 7/7/21 4:24 PM, Jonathan Wakely wrote:
> > 
> > 
> > On Wed, 7 Jul 2021, 23:18 Martin Sebor,  > > wrote:
> > 
> >     On 7/7/21 3:53 PM, Marek Polacek wrote:
> >  > I'm not sure why you keep hitting so many issues; git addlog
> >     takes care of
> >  > this stuff for me and I've had no trouble pushing my
> > patches.  Is
> >     there
> >  > a reason you don't use it also?
> > 
> >     I probably have a completely different workflow.  Git addlog
> > isn't
> >     a git command (is it some sort of a GCC extension?), and what I
> > put
> >     in the subject of my emails is almost never the same thing as
> > what
> >     I put in the commit message. 
> > 
> > 
> > Why not? Why is it useful to write two different explanations of
> > the patch?
> 
> Sometimes, maybe.  I don't really think about it too much.  I'm not
> the only one who does it.  But what bearing does what we put in
> the subject of our patch submissions have on this discussion?

FWIW if you use a different subject line for the email as for commit
message, it makes it harder to find discussion about the patch in the
list archives.

> You may have one way of doing things and others another.  Yours may
> even be better/more streamlined, I don't know.  That doesn't mean
> our tooling should make things more difficult for the the rest of us.
> 
> Martin
> 




Re: daily report on extending static analyzer project [GSoC]

2021-07-11 Thread David Malcolm via Gcc
On Sat, 2021-07-10 at 21:27 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - update the call_string to store a vector of pair of supernode*
> instead of pointer to it 
> - create a typdef to give a meaning full name to these " pair of
> supernode* “
> - send the patch list to gcc-patches mailing list
> - add the example program I created to the GCC tests
> 
> —
> 
> PROGRESS :
> 
> - I successfully changed the entire call string representation again to
> keep track of "auto_vec m_elements;” from "auto_vec std::pair*> m_supernodes;” 
> 
> - while doing so, one hurdle I found was to change "hashval_t hash ()
> const;”, function of which I quite didn’t understood properly, but
> looking at source, it looked like it just needed some value ( integer
> or pointer ) to add to ‘hstate’ and ' m_elements’ was neither a pointer
> nor an integer so I instead added pointer to callee supernode (
> “second” of the m_elements ) to the ‘hstate’ for now. 
> 
> - for the callstring patch, I created a patch file ( using git format-
> patch ) and sent it to patches mailing list (via git send email ) and
> CCed my mentor.
> Although I received a positive reply from the command log (git send
> email) saying the mail was sent , I didn’t received that mail ( being
> subscribed to the patches list ) regarding the same ( I sent that just
> before sending this mail ).
> The mail should be sent from arse...@sourceware.org  arse...@sourceware.org> 

Thanks.

I see the patch email in the list archives here:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574888.html
but for some reason it's not showing up in my inbox.  I'm not sure why;
I recently got migrated to a new email server and my filters are
currently a mess so it could be a problem at my end; sorry if that's
the case.

Given that neither you nor I seem to have received the patch I wonder
if anyone else received it?

Given that, I'm going to reply to that patch email inline here (by
copying and pasting it from the archive):

> [PATCH 1/2] analyzer: refactor callstring to work with pairs of supernodes 
> [GSoC]
> 
> 2021-07-3  Ankur Saini  

There are some formatting rules that we follow with ChangeLog entries.
We have a script:

  ./contrib/gcc-changelog/git_check_commit.py --help

that you can run to check the formatting.

> * gcc/analyzer/call-string.cc: refactor callstring to work with pair 
> of supernodes instead of super superedges
> * gcc/analyzer/call-string.h: make callstring work with pairs of 
> supernodes
> * gcc/analyzer/program-point.cc: refactor program point to work with 
> new call-string format

The "gcc/analyzer" directory has its own ChangeLog file, and filenames
should be expressed relative to it, so these entries should read
something like:

gcc/analyzer/ChangeLog:
* call-string.cc: ...etc
* call-string.h: ...etc
* program-point.cc: ...etc

The entries should be sentences (i.e. initial capital letter and
trailing full-stop).

They should be line-wrapped (I do it at 74 columns), giving this:

gcc/analyzer/ChangeLog:
* call-string.cc: Refactor callstring to work with pair of
supernodes instead of superedges.
* call-string.h: Make callstring work with pairs of supernodes.
* program-point.cc: Refactor program point to work with new
call-string format.

Your text editor might have a mode for working with ChangeLog files.

[...snip...]

> @@ -152,22 +152,40 @@ call_string::push_call (const supergraph &sg,
>gcc_assert (call_sedge);
>const return_superedge *return_sedge = call_sedge->get_edge_for_return 
> (sg);
>gcc_assert (return_sedge);
> -  m_return_edges.safe_push (return_sedge);
> +  const std::pair *e = new 
> (std::pair)

We don't want lines wider than 80 columns unless it can't be helped. 
Does your text editor have a feature to warn you about overlong lines?

Changing from:
  std::pair
to:
  element_t
should make it much easier to avoid overlong lines.

[...snip...]

> diff --git a/gcc/analyzer/call-string.h b/gcc/analyzer/call-string.h
> index 7721571ed60..0134d185b90 100644
> --- a/gcc/analyzer/call-string.h
> +++ b/gcc/analyzer/call-string.h

[...snip...]

> +
> +  void push_call (const supernode *src, 
> +const supernode *dest);

There's some odd indentation here.  Does your text editor have option
to
(a) show visible whitespace (distinguish between tabs vs spaces)
(b) enforce a coding standard?

If your editor supports it, it's easy to comply with a project's coding
standards, otherwise it can be a pain.

[...snip...]

>  private:
> -  auto_vec m_return_edges;
> +  //auto_vec m_return_edges;
> +  auto_vec*> 
> m_supernodes;
>  };

Commenting out lines is OK during prototyping.  Obviously as the patch
gets closer to be being ready we want to simply delete them instead.

[...]

> >From 95572742f1aaad1975aa35a663e8b26e671d4323 Mon Sep 17 00:00:00 2001
> From: Ankur Saini 
> Date: Sat, 10 Jul 2021 19:28:49 +0530
> Subject: [PATCH 2/2]

Re: daily report on extending static analyzer project [GSoC]

2021-07-11 Thread David Malcolm via Gcc
On Sun, 2021-07-11 at 22:31 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - get "state_purge_per_ssa_name::process_point () “ to  go from the
> “return" supernode to the “call” supernode.
> - fix bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546> in the process 
> - test and observe the effect of changes done so far on vfunc calls
> 
> —
> 
> PROGRESS :
> 
> - In order to go from “return” supernode to “call” supernode, I used
> the fact that return supernode will have GIMPLE call statement which
> when passed to “get_supernode_for_stmt ()”  returns pointer to “call”
> supernode. 
> 
> now that part of the function look something like this 
> 
> File: {SCR_DIR}/gcc/analyzer/state-purge.cc 
> 
> 347:    /* Add any intraprocedually edge for a call.  */
> 348:    if (snode->m_returning_call)
> 349:  {
> 350:cgraph_edge *cedge
> 351:  = supergraph_call_edge (snode->m_fun,
> 352:  snode-
> >m_returning_call);
> 353:if(!cedge)
> 354:{
> 355:supernode* callernode = map.get_sg
> ().get_supernode_for_stmt(snode->m_returning_call);
> 356:gcc_assert (callernode);
> 357:add_to_worklist
> 358:  (function_point::after_supernode
> (callernode),
> 359:   worklist, logger);
> 360:}
> 361:else
> 362:{
> 363:gcc_assert (cedge);
> 364:superedge *sedge
> 365:  = map.get_sg
> ().get_intraprocedural_edge_for_call (cedge);
> 366:gcc_assert (sedge);
> 367:add_to_worklist
> 368:  (function_point::after_supernode
> (sedge->m_src),
> 369:   worklist, logger);
> 370:}
> 371:  }
> 
> - now the patch also fixes bug #100546 (
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>) and doesn’t give
> out a false report about dereferencing a null pointer which will never
> happen.

Excellent.  You should add a testcase for that bug to the test suite.

> 
> - now I tested it with vfuncs to see what happens in that case, the
> results were as expected where the analyzer detects the call to virtual
> function and split call and returning supernodes, but did not
> understand which function to calll, making nodes after it unreachable. 
> 
> - Now If we somehow able to update the regional model to understand
> which function is called ( or may be called ) then the analyzer can now
> easily call and analyze that virtual function call.

I had some ideas about how to do this here:
  https://gcc.gnu.org/pipermail/gcc/2021-April/235335.html
which might work for simple cases where we have a code path through a
ctor of a known subclass

...but I haven't looked in detail at ipa-devirt.c yet, so I might be
wrong.

> 
> 
> STATUS AT THE END OF THE DAY :- 
> 
> - get "state_purge_per_ssa_name::process_point () “ to  go from the
> “return" supernode to the “call” supernode. ( done )
> - fix bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <  
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546> in the process. (
> done )
> - test and observe the effect of changes done so far on vfunc calls (
> done )
> 
> —
> P.S. 
> regarding the patch I sent to mailing list yesterday. I found it,
> apparently the mail was detected as a "spam mail” by my system and was
> redirected  to my spam inbox. 

Strange.  I didn't see it in my spam folder.

> Btw I am also attaching that patch file with this mail for the records.

Thanks.  I've replied to it in another email here:
  https://gcc.gnu.org/pipermail/gcc/2021-July/236726.html

Dave



Re: Benefits of using Sphinx documentation format

2021-07-12 Thread David Malcolm via Gcc
On Mon, 2021-07-12 at 15:25 +0200, Martin Liška wrote:
> Hello.
> 
> Let's make it a separate sub-thread where we can discuss motivation
> why
> do I want moving to Sphinx format.
> 
> Benefits:
> 1) modern looking HTML output (before: [1], after: [2]):

"modern looking" is rather subjective; I'd rate Sphinx's output as
looking like it's from 2010s (last decade), whereas Texinfos' looks
like it's from the 1990s.  In theory this ought not to matter, but
every time I look at our documentation it gives me a depressing
feeling, reminiscent of a graveyard, that discourages me from fixing
things.

>     a) syntax highlighting for examples (code, shell commands, etc.)

...with support for multiple programming languages, potentially on the
same page.  For example, in the libgccjit docs:
  https://gcc.gnu.org/onlinedocs/jit/intro/tutorial02.html
we can have a mixture of C, assembler and shell on one page, and each
example is syntax-highlighted accordingly.  It's not clear to me how to
do that in texinfo, since there needs to be a way to express what
language an example is in.

>     b) precise anchors, the current Texinfo anchors are not displayed
> (start with first line of an option)

...and the URLs are sane and stable (so e.g. there is a reliable,
guessable, readable URL for the docs for say, "-Wall").

>     c) one can easily copy a link to an anchor (displayed as ¶)
>     d) internal links are working, e.g. one can easily jump from
> listing of options
>     e) left menu navigation provides better orientation in the manual
>     f) Sphinx provides internal search capability: [3]

...also (quoting myself in places here from 2015
  https://gcc.gnu.org/pipermail/gcc-patches/2015-November/434055.html 
):

* the ability to include fragments of files: libgccjit's documentation
uses directives to include code from the test suite, so that all of the
code examples are also part of the test suite, and are thus known to
compile), allowing for (almost) literate programming.  [That said, the
build of libgccjit's docs on gcc.gnu.org seems to be missing those
fragments; I wonder if there's a path or version issue?]

* a page-splitting structure that make sense, to me, at least (I have
never fathomed the way texinfo's navigation works, for HTML, at least,
and I believe I'm not the only one; I generally pick the all-in-one-
HTML-page option when viewing texinfo-html docs and do textual
searches, since otherwise I usually can't find the thing I'm looking
for (or have to resort to a brute-force depth-first search of clicking
through the links).)

* much more use of markup, with restrained and well-chosen CSS
(texinfo's HTML seems to ignore much of the inline markup in
the .texinfo file)

> 2) internal links are also provided in PDF version of the manual
> 3) some existing GCC manuals are already written in Sphinx (GNAT
> manuals and libgccjit)
> 4) support for various output formats, some people are interested in
> ePUB format
> 5) Sphinx is using RST which is quite minimal semantic markup language

Sphinx is also used by many high-profile FLOSS projects (e.g. the Linux
kernel, LLVM, and the Python community), so it reduces the barrier to
entry for new contributors, relative to texinfo.


> 6) TOC is automatically generated - no need for manual navigation
> like seen here: [5]
> 
> Disadvantages:
> 
> 1) info pages are currently missing Page description in TOC
> 2) rich formatting is leading to extra wrapping in info output -
> beings partially addresses in [4]
> 3) one needs e.g. Emacs support for inline links (rendered as notes)
> 
> I'm willing to address issue 1) in next weeks and I tend to skip
> emission of links as mentioned in 3).
> Generally speaking, I'm aware that some people still use Info, but I
> think we should more focus
> on more modern documentation formats. That's HTML (and partially
> PDF).

I think the output formats we need to support are:
- HTML
- PDF
- man page (hardly "modern", but still used)

I regared "info" as merely "nice to have" - I don't know anyone who
uses it other than some core GNU contributors.

Dave

> 
> Martin
> 
> [1]
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fstrict-aliasing
> [2]
> https://splichal.eu/gccsphinx-final/html/gcc/gcc-command-options/options-that-control-optimization.html#cmdoption-fstrict-aliasing
> [3]
> https://splichal.eu/gccsphinx-final/html/gcc/search.html?q=-fipa-icf&check_keywords=yes&area=default#
> [4] https://github.com/sphinx-doc/sphinx/pull/9391
> [5] @comment node-name, next,  previous, up
>  @node    Installing GCC, Binaries, , Top
> 




Re: daily report on extending static analyzer project [GSoC]

2021-07-14 Thread David Malcolm via Gcc
On Mon, 2021-07-12 at 22:07 +0530, Ankur Saini wrote:
> > 
> > On 11-Jul-2021, at 11:19 PM, David Malcolm 
> > wrote:
> > 
> > On Sat, 2021-07-10 at 21:27 +0530, Ankur Saini wrote:

[...]

> > > 
> > > - for the callstring patch, I created a patch file ( using git
> > > format-
> > > patch ) and sent it to patches mailing list (via git send email )
> > > and
> > > CCed my mentor.
> > > Although I received a positive reply from the command log (git send
> > > email) saying the mail was sent , I didn’t received that mail (
> > > being
> > > subscribed to the patches list ) regarding the same ( I sent that
> > > just
> > > before sending this mail ).
> > > The mail should be sent from arse...@sourceware.org  > > arse...@sourceware.org>  > > arse...@sourceware.org > 
> > 
> > Thanks.
> > 
> > I see the patch email in the list archives here:
> >  https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574888.html < 
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574888.html>
> > but for some reason it's not showing up in my inbox.  I'm not sure
> > why;
> > I recently got migrated to a new email server and my filters are
> > currently a mess so it could be a problem at my end; sorry if that's
> > the case.
> > 
> > Given that neither you nor I seem to have received the patch I wonder
> > if anyone else received it?
> 
> Then I think it’s better to attach patch file with the updates here
> instead.

FWIW I use "git send-email".  Might be worth trying that again to see
if it happens again, or if it was a one-time glitch.

[...]

> 
> 
> > 
> > If your editor supports it, it's easy to comply with a project's
> > coding
> > standards, otherwise it can be a pain.
> 
> Oh, I see. This explains the weird indentation convention I was seeing
> throughout the source. Actually my editor dynamically adjusts the width
> of the tab depending on the style used in source file and due to some
> reasons, it decided that it was 2 space wide here, this was leading to
> some weird indentations throughout the source. 
> Well now it should be fixed, I manually adjusted it to be standard 8
> wide now and switched of converting tabs to spaces in my editor
> settings.

Well, it is 2 spaces wide, but using tabs to take the place of 8 spaces
at a time when the indentation gets too large.

[...]
> 
> 
> > 
> > [...]
> > 
> > > > From 95572742f1aaad1975aa35a663e8b26e671d4323 Mon Sep 17 00:00:00
> > > > 2001
> > > From: Ankur Saini  > > arse...@sourceware.org>>
> > > Date: Sat, 10 Jul 2021 19:28:49 +0530
> > > Subject: [PATCH 2/2] analyzer: make callstring's pairs of
> > > supernodes
> > > statically allocated [GSoC]
> > > 
> > >    2021-07-10  Ankur Saini   > > arse...@sourceware.org>>
> > > 
> > > gcc/analyzer/
> > >    * call-string.cc : store a
> > > vector of std::pair of supernode* instead of pointer to them
> > >    * call-string.h: create a typedef for "auto_vec > > std::pair*> m_supernodes;" to
> > > enhance readibility
> > 
> > ...and to avoid having really long lines!
> > 
> > >    * program-point.cc : refactor
> > > program point to work with new call-string format
> > 
> > I think it's going to be much easier for me if you squash these two
> > patches into a single patch so I can review the combined change.  (If
> > you've not seen it yet, try out "git rebase -i" to see how to do
> > this).
> 
> woah, this is magic !
> I always use to perform a soft reset ( git reset —soft  ) and
> commit in order to squash or reword my commits before, but never knew
> we could change history locally like this, amazing : D

I love "git rebase -i" and "git add -p"; together they make me look
like a much better programmer than I really am :)

[...]

> > > 
> > > + typedef std::pair element_t;
> > 
> > Rather than a std::pair, I think a struct inside call_string like
> > this
> > would be better: rather than "first" and "second" we can refer to
> > "m_caller" and "m_callee", which is closer to being self-documenting,
> > and it allows us to add member functions e.g. "get_caller_function":
> > 
> > class call_string
> > {
> > public:
> >  struct element_t
> >  {
> >    element_t (const supernode *caller, const supernode *callee)
> >    : m_caller (caller), m_callee (callee)
> >    {
> >    }
> > 
> >    function *get_caller_function () const {/*etc*/}
> >    function *get_callee_function () const {/*etc*/}
> > 
> >    const supernode *m_caller;
> >    const supernode *m_callee;
> >  };
> > 
> > };
> > 
> > [...snip...]
> > 
> > which might clarify the code further.
> 
> Instead of putting that struct inside the class, I declared it globally
> and overloaded some basic operators ( “==“ and “!=“ ) to make it work
> without having to change a lot how it is being handled in other areas
> of source ( program_point.cc and engine.cc ).

Fair enough, but calling it "element_t" seems too generic to me if it's
going to be in global scope.  I w

Re: daily report on extending static analyzer project [GSoC]

2021-07-14 Thread David Malcolm via Gcc
On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
> CURRENT STATUS OF PROJECT:
> 
> - The analyzer can now sucessfully detect and analyze function calls
> that 
>   doesn't have a callgraph edge ( like a call via function pointer )

Excellent.

> 
> - A weird indentation problem caused by my text editor pointed out in
>   one of the previous mails ( 
> https://gcc.gnu.org/pipermail/gcc/2021-July/236747.html) 
>   , that despite being fixed, still messed up indentation in all of
> the changes
>   I have done so far.
> 
> - the analyser can still not detect a call via vtable pointer
> 
> ---
> AIM FOR TODAY: 
> 
> - Complete the first evaluation of GSoC
> - Fix the indentation errors my generated by my editor on changes
> done till now
> - Add the tests to regress testing 
> - Create a ChangeLog for the next patch 
> - Attach the patch with this mail 
> - Layout a new region subclass for vtables ( getting ready for next
> patch )
> 
> ---
> PROGRESS  :
> 
> - To fix the indentaion problem, I simply created a diff and fixed
> all of them
>   manually. I also found and read a doc regarding coding convention
> used by GCC 
>   (https://gcc.gnu.org/codingconventions.html) and refactored the
> chagnes and
>   changelog to follow this.

Great.

> 
> - After that I branched out and layed out foundation for next update
>   and started created a subclass region for vtable ( vtable_region ),
> which  
>   currently do nothing
> 
> - After that in order to give some final finishing touches to
> previous changes,
>   I created chagnelog and added 2 more tests to the analyzer
> testsuite as
>   follows :
> 
>   1. (function-ptr-4.c)
>   ```
[...snip...]
>   ```
>   (godbolt link )

Looks promising.

Does this work in DejaGnu?  The directive:
  /* { dg-warning "double-‘free’ of ‘int_ptr’" } */
might need changing to:
  /* { dg-warning "double-'free' of 'int_ptr'" } */
i.e. fixing the quotes to use ASCII ' rather than ‘ and ’.

It's worth running the testcases with LANG=C when generating the
expected outputs.  IIRC this is done automatically by the various "make
check-*".


> 
>   2. ( pr100546.c <   
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>)
>   ```
>   #include 
>   #include 
>   
>   static void noReturn(const char *str) __attribute__((noreturn));
>   static void noReturn(const char *str) {
>   printf("%s\n", str);
>   exit(1);
>   }
>   
>   void (*noReturnPtr)(const char *str) = &noReturn;
>   
>   int main(int argc, char **argv) {
>   char *str = 0;
>   if (!str)
>   noReturnPtr(__FILE__);
>   return printf("%c\n", *str);
>   }
>   ```
>   (godbolt link )
> 
> - But at the time of testing ( command used 
>   was `make check-gcc RUNTESTFLAGS="-v -v analyzer.exp=pr100546.c"`),
> both of 
>   them failed unexpectedly with Segmentation fault at the call
> 
> - From further inspection, I found out that this is due 
>   "-fanalyzer-call-summaries" option, which looks like activats call
> summaries
> 
> - I would look into this in more details ( with gdb ) tomorrow, right
> now 
>   my guess is that this is either due too the changes I did in state-
> purge.cc
>   or is a call-summary related problem ( I remember it not being 
>   perfetly implemented right now). 

I'm not proud of the call summary code, so that may well be the
problem.

Are you able to use gdb on the analyzer?  It ought to be fairly
painless to identify where a segfault is happening, so let me know if
you're running into any problems with that.

> 
> ---
> STATUS AT THE END OF THE DAY :- 
> 
> - Complete the first evaluation of GSoC ( done )
> - Fix the indentation errors my generated by my editor on changes
> done till now ( done )
> - Layout a new region subclass for vtables ( done )
> - Create a ChangeLog for the next patch ( done )
> - Add the tests to regress testing ( pending )
> - Attach the patch with this mail ( pending )
> 
> ---
> HOUR-O-METER :- 
> no. of hours spent on the project today : 4 hours
> Grand total (by the end of 14th July 2021): 195 hours

Thanks for estimating the time you're spending on the project.  I'm
wondering what the "grand total" above is covering: are you counting
the application and "community bonding" periods, or just the "coding"
period.

Do you have more of a per-week breakdown for the coding period?

The guidance from Google is that students are expected to spend
roughtly 175 hours total in the coding period of a GSoC 2021 project,
so I'm a bit alarmed if you've already spent more than that time when
we're only halfway through.

Dave



Re: daily report on extending static analyzer project [GSoC]

2021-07-16 Thread David Malcolm via Gcc
On Fri, 2021-07-16 at 21:04 +0530, Ankur Saini wrote:
> 
> 
> > On 15-Jul-2021, at 4:53 AM, David Malcolm 
> > wrote:
> > 
> > On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
> > 
> > 

[...snip...]

> > 
> > > 
> > >   2. ( pr100546.c <   
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 < 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>>)
> > >   ```
> > >   #include 
> > >   #include 
> > >   
> > >   static void noReturn(const char *str) __attribute__((noreturn));
> > >   static void noReturn(const char *str) {
> > >   printf("%s\n", str);
> > >   exit(1);
> > >   }
> > >   
> > >   void (*noReturnPtr)(const char *str) = &noReturn;
> > >   
> > >   int main(int argc, char **argv) {
> > >   char *str = 0;
> > >   if (!str)
> > >   noReturnPtr(__FILE__);
> > >   return printf("%c\n", *str);
> > >   }
> > >   ```
> > >   (godbolt link  > > https://godbolt.org/z/aWfW51se3>>)
> > > 
> > > - But at the time of testing ( command used 
> > >   was `make check-gcc RUNTESTFLAGS="-v -v
> > > analyzer.exp=pr100546.c"`),
> > > both of 
> > >   them failed unexpectedly with Segmentation fault at the call
> > > 
> > > - From further inspection, I found out that this is due 
> > >   "-fanalyzer-call-summaries" option, which looks like activats
> > > call
> > > summaries
> > > 
> > > - I would look into this in more details ( with gdb ) tomorrow,
> > > right
> > > now 
> > >   my guess is that this is either due too the changes I did in
> > > state-
> > > purge.cc 
> > >   or is a call-summary related problem ( I remember it not being 
> > >   perfetly implemented right now). 
> > 
> > I'm not proud of the call summary code, so that may well be the
> > problem.
> > 
> > Are you able to use gdb on the analyzer?  It ought to be fairly
> > painless to identify where a segfault is happening, so let me know if
> > you're running into any problems with that.
> 
> Yes, I used gdb on the analyzer to go into details and looks like I was
> correct, the program was crashing in “analysis_plan::use_summary_p ()”
> on line 114 ( const cgraph_node *callee = edge->callee; ) where program
> was trying to access callgraph edge which didn’t exist .
> 
> I fixed it by simply making analyzer abort using call summaries in
> absence of callgraph edge.
> 
> File: {src-dir}/gcc/analyzer/analysis-plan.cc
> 
> 105: bool
> 106: analysis_plan::use_summary_p (const cgraph_edge *edge) const
> 107: {
> 108:   /* Don't use call summaries if -fno-analyzer-call-summaries.  */
> 109:   if (!flag_analyzer_call_summaries)
> 110: return false;
> 111: 
> 112:   /* Don't use call summaries if there is no callgraph edge */
> 113:   if(!edge || !edge->callee)
> 114: return false;
> 
> and now the tests are passing successfully. ( both manually and via
> DejaGnu ).

Great.

> 
> I have attached a sample patch of work done till now with this mail for
> review ( I haven’t sent this one to the patches list as it’s change log
> was not complete for now ).
> 
> P.S. I have also sent another mail (   
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html <  
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html> ) to
> patches list with the previous call-string patch and this time it
> popped up in my inbox as it should, did you also received it now ?

I can see it in the archive URL, but for some reason it's not showing
up in my inbox.  Bother.  Please can you try resending it directly to
me?

Putting email issues to one side, the patch you linked to above looks
good.  To what extent has it been tested?  If it bootstraps and passes
the test suite, it's ready for trunk.

Note that over the last couple of days I pushed my "use of
uninitialized values" detection work to trunk (aka master), along with
various other changes, so it's worth pulling master and rebasing on top
of that before testing.  I *think* we've been touching different parts
of the analyzer code, but there's a chance you might have to resolve
some merge conflicts.

As for the patch you attached to this email
  "[PATCH] analyzer: make analyer detect calls via function pointers"
here's an initial review:

> diff --git a/gcc/analyzer/analysis-plan.cc b/gcc/analyzer/analysis-plan.cc
> index 7dfc48e9c3e..1c7e4d2cc84 100644
> --- a/gcc/analyzer/analysis-plan.cc
> +++ b/gcc/analyzer/analysis-plan.cc
> @@ -109,6 +109,10 @@ analysis_plan::use_summary_p (const cgraph_edge *edge) 
> const
>if (!flag_analyzer_call_summaries)
>  return false;
>  
> +  /* Don't use call summaries if there is no callgraph edge */
> +  if(!edge || !edge->callee)
> +return false;

Is it possible for a cgraph_edge to have a NULL callee?  (I don't think
so, but I could be wrong)

Nit: missing space between "if" and open-paren

> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
> index 7662a7f7bab..f45a614c0ab 100644
> --- a/gcc/analyzer/engine.cc
> +++ b/gcc/analyzer/engine.cc

Re: daily report on extending static analyzer project [GSoC]

2021-07-22 Thread David Malcolm via Gcc
On Wed, 2021-07-21 at 21:44 +0530, Ankur Saini wrote:
> 
> 
> > On 17-Jul-2021, at 2:57 AM, David Malcolm 
> > wrote:
> > 
> > On Fri, 2021-07-16 at 21:04 +0530, Ankur Saini wrote:
> > > 
> > > 
> > > > On 15-Jul-2021, at 4:53 AM, David Malcolm 
> > > > wrote:
> > > > 
> > > > On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
> > > > 
> > > > 
> > 

[...snip...]

> > > 
> > > I have attached a sample patch of work done till now with this
> > > mail for
> > > review ( I haven’t sent this one to the patches list as it’s
> > > change log
> > > was not complete for now ).
> > > 
> > > P.S. I have also sent another mail (   
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html <
> > >  
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html>
> > > ) to
> > > patches list with the previous call-string patch and this time it
> > > popped up in my inbox as it should, did you also received it now
> > > ?
> > 
> > I can see it in the archive URL, but for some reason it's not
> > showing
> > up in my inbox.  Bother.  Please can you try resending it directly
> > to
> > me?
> 
> Ok, I have sent the call-string patch directly to you. I have
> actually sent 2 mails ( from different mail ids ) to check if it’s
> the id which is causing the issue or the contents of the mail itself.

I've been looking, but I don't see the patch.  Sorry about this.

> 
> > 
> > Putting email issues to one side, the patch you linked to above
> > looks
> > good.  To what extent has it been tested?  If it bootstraps and
> > passes
> > the test suite, it's ready for trunk.
> 
> It bootstrapped successfully on a couple of the x86_64 machines ( on
> gcc farm ) And regress testing is underway.

Great.

[...snip]

> > 
> > In any case, the output above is missing some events: I think
> > ideally
> > it would show the calls from *fun_ptr to fun and the returns,
> > giving
> > something like the following (which I mocked up by hand):
> > 
> >  'double_call': events 1-3
> >    |
> >    |   16 | void double_call()
> >    |  |  ^~~
> >    |  |  |
> >    |  |  (1) entry to 'double_call'
> >    |   17 | {
> >    |   18 | int *int_ptr = (int*)malloc(sizeof(int));
> >    |  |  ~~~
> >    |  |  |
> >    |  |  (2) allocated here
> >    | ...  |
> >    |   19 | (*fun_ptr)(int_ptr);
> >    |  | ^~~
> >    |  | |
> >    |  | (3) calling 'fun' from 'double-call'
> >    |
> >    +--> 'fun': events 3-6
> >   |
> >   |    4 | void fun(int *int_ptr)
> >   |  |  ^~~
> >   |  |  |
> >   |  |  (4) entry to ‘fun’
> >   |    5 | {
> >   |    6 | free(int_ptr);
> >   |  | ~
> >   |  | |
> >   |  | (5) first 'free' here
> >   |
> >    <--+
> >    |
> >  'double_call': events 6-7
> >    |
> >    |   19 | (*fun_ptr)(int_ptr);
> >    |  | ^~~
> >    |  | |
> >    |  | (6) returning to 'double-call' from 'fun'
> >    |   20 | (*fun_ptr)(int_ptr);
> >    |  | ^~~
> >    |  | |
> >    |  | (7) calling 'fun' from 'double-call'
> >    |
> >    +--> 'fun': events 8-9
> >   |
> >   |    4 | void fun(int *int_ptr)
> >   |  |  ^~~
> >   |  |  |
> >   |  |  (8) entry to ‘fun’
> >   |    5 | {
> >   |    6 | free(int_ptr);
> >   |  | ~
> >   |  | |
> >   |  | (9) second 'free' here; first 'free' was
> > at (5)
> > 
> > The events are created in diagnostic-manager.cc
> > 
> > Am I right in thinking that there's a interprocedural superedge for
> > the
> > dynamically-discovered calls?
> 
> No there isn’t, such calls will only have an exploded edge and no
> interprocedural superedge
> 
> > 
> > diagnostic_manager::add_events_for_superedge creates events for
> > calls
> > and returns, so maybe you just need to update the case
> > SUPEREDGE_INTRAPROCEDURAL_CALL there, to do something for the
> > "dynamically discovered edge" cases (compare it with the other
> > cases in
> > that function).   You might need to update the existing call_event
> > and
> > return_event subclasses slightly (see checker-path.cc/h)
> 
> As we already have exploded edges representing the call, my idea was
> to add event for such cases via custom edge info ( similar to what we
> have for longjmp case ) instead of creating a special case in
> diagnostic_manager::add_events_for_superedge ().

That sounds like it could work too.

Dave

> 
> > 
> > Ideally, event (7) above would read
> >  "passing freed pointer 'int_ptr' in call to '

Re: daily report on extending static analyzer project [GSoC]

2021-07-22 Thread David Malcolm via Gcc
On Thu, 2021-07-22 at 22:40 +0530, Ankur Saini wrote:
> AIM FOR TODAY: 
> 
> - Add custom edge info to the eedges created for dynamically
> discovered calls
> - Add the custom events to be showing in diagnostics
> - update call_event and return_event to also work for the cases where
> there is no underlying superedge representing the call
> 
> ---
> PROGRESS  :
> 
> - I created "dynamic_call_info_t" subclass reprsenting custom info on
> the edge representing the dynamically discovered calls 
> 
> - I overloaded it's "add_events_to_path ()" function to add call and
> return event to checkers path
> 
> - Now call_event and return_event subclasses mostly make use of the
> underlying interprocedural superedge representing the call to work
> properly. To tackle this problem, I used the same method I used for
> callstring patch earlier working with src and dest supernodes instead
> of superedge )
> 
> - The call_event subclass (and same applies to return_event subclass
> also) now have 2 additional pointers to source and destination
> supernodes representing the call in absense of a superedge. 
> 
> - I have also tweeked a few more things to make it work, I think the
> best way to show them all is to attach a patch ( it should be
> attached with this mail ) for just the changes I did today for better
> understanding on what exactly have I changed since last update. (
> this patch would be squashed in previous one before the final review
> ).

It's much easier to understand via the patch :)

> 
> - After all the changes done, now the analyzer emmits the following
> error message for the test program ( godbolt link  
> https://godbolt.org/z/Td8n4c9a6  ),
> which I think now emmits all the events it was missing before.
> 
> ```
> test.c: In function ‘fun’:
> test.c:6:9: warning: double-‘free’ of ‘int_ptr’ [CWE-415] [-
> Wanalyzer-double-free]
>     6 | free(int_ptr);
>   | ^
>   ‘double_call’: events 1-3
>     |
>     |   16 | void double_call()
>     |  |  ^~~
>     |  |  |
>     |  |  (1) entry to ‘double_call’
>     |   17 | {
>     |   18 | int *int_ptr = (int*)malloc(sizeof(int));
>     |  |  ~~~
>     |  |  |
>     |  |  (2) allocated here
>     |   19 | void (*fun_ptr)(int *) = &fun;
>     |   20 | (*fun_ptr)(int_ptr);
>     |  | ~~~
>     |  |  |
>     |  |  (3) calling ‘fun’ from ‘double_call’
>     |
>     +--> ‘fun’: events 4-5
>    |
>    |    4 | void fun(int *int_ptr)
>    |  |  ^~~
>    |  |  |
>    |  |  (4) entry to ‘fun’
>    |    5 | {
>    |    6 | free(int_ptr);
>    |  | ~
>    |  | |
>    |  | (5) first ‘free’ here
>    |
>     <--+
>     |
>   ‘double_call’: events 6-7
>     |
>     |   20 | (*fun_ptr)(int_ptr);
>     |  | ~^~
>     |  |  |
>     |  |  (6) returning to ‘double_call’ from ‘fun’
>     |   21 | (*fun_ptr)(int_ptr);
>     |  | ~~~
>     |  |  |
>     |  |  (7) calling ‘fun’ from ‘double_call’
>     |
>     +--> ‘fun’: events 8-9
>    |
>    |    4 | void fun(int *int_ptr)
>    |  |  ^~~
>    |  |  |
>    |  |  (8) entry to ‘fun’
>    |    5 | {
>    |    6 | free(int_ptr);
>    |  | ~
>    |  | |
>    |  | (9) second ‘free’ here; first ‘free’ was
> at (5)
>    |
> ```

Looks great.


> 
> ---
> STATUS AT THE END OF THE DAY :- 
> 
> - Add custom edge info to the eedges created for dynamically
> discovered calls (done )
> - Add the custom events to be showing in diagnostics (done)
> - update call_event and return_event to also work for the cases where
> there is no underlying superedge representing the call (done)
> 
> --- 
> Question / doubt :- 
> 
> - In "case EK_RETURN_EDGE” of
> "diagnostic_manager::prune_for_sm_diagnostic ()” function. 
> 
> File:{source_dir}/gcc/analyzer/diagnostic-manager.cc
> 2105:   log ("event %i:"
> 2106:    " recording critical state for %qs at
> return"
> 2107:    " from %qE in caller to %qE in callee",
> 2108:    idx, sval_desc.m_buffer, callee_var,
> callee_var);
> 
> shouldn’t it be 
> 
> 2107:    " from %qE in caller to %qE in callee",
> 2108:    idx, sval_desc.m_buffer, caller_var,
> callee_var);

Good catch: I think it should be the latter version.  (posting it as a
unified diff would make it easier f

Re: daily report on extending static analyzer project [GSoC]

2021-07-29 Thread David Malcolm via Gcc
On Thu, 2021-07-29 at 18:20 +0530, Ankur Saini wrote:
> I have attached the patches(one is the updated version of previous
> patch to 
> detect calls via function pointers) of the changed done to make the
> analyzer 
> understand the calls to virtual functions for initial review. 
> 
> 1. I decided to make a dedicated function to create enodes and eedges
> for the 
> dynamically discovered calls as I found myself using the exact same
> peice of 
> code again to analyse vfunc calls.

Makes sense.

> 
> 2. Boostaraping and testing of these changes are underway.
> 
> 3. Regarding the regress tests that have to be added to test
> functionality of 
> vfunc extension patch :
> Should I add many test files for different types of inheritences or
> should I 
> add one ( or two ) test files, with a lot of fucntions in them testing
> different 
> types of calls ?

Both approaches have merit, and there's an element of personal taste.

I find that during development and debugging it's handy to have the
tests broken out into individual files, but it's good to eventually
combine the tests to minimize the number of invocations that the test
harness has to do.

That said, interprocedural tests tend to be fiddly, so it's often good
to keep these in separate files.

I tend to combine my tests and add them to git, and then to temporarily
trim them down when debugging them to minimize the amoung of unrelated
stuff I'm having to look at when debugging, knowing that git has the
full version saved.

I hope that answers your question.

> 
> ---
> Patches :

This isn't a full review, but...

fn_ptr.patch:

> diff --git a/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c 
> b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
> new file mode 100644
> index 000..c62510c026f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
> @@ -0,0 +1,25 @@
> +/*Test to see if the analyzer detect and analyze calls via 
> +  fucntion pointers or not. */
> +
> +#include 
> +#include 
> +
> +void fun(int *int_ptr)
> +{
> + free(int_ptr); /* { dg-warning "double-'free' of 'int_ptr'" } */
> +}
> +
> +void single_call()
> +{
> + int *int_ptr = (int*)malloc(sizeof(int));
> + void (*fun_ptr)(int *) = &fun;
> + (*fun_ptr)(int_ptr);
> +}
> +
> +void double_call()
> +{
> + int *int_ptr = (int*)malloc(sizeof(int));
> + void (*fun_ptr)(int *) = &fun;
> + (*fun_ptr)(int_ptr);
> + (*fun_ptr)(int_ptr);
> +}

...thinking back to our discussion about events, it would be good to
verify that the analyzer is emitting them.  You can put directives
like:

   /* { dg-message "calling 'fun' from 'double_call'" } */

on the appropriate lines to test this via DejaGnu.

"analyzer: detect and analyzer vfunc calls"

[...snip...]

> @@ -1242,6 +1243,17 @@ exploded_node::on_stmt (exploded_graph &eg,
>   unknown_side_effects = false;
>  }
>  
> +  /* If the statmement is a polymorphic call then assume 
> + there are no side effects.  */
> +  gimple *call_stmt = const_cast(stmt);
> +  if (gcall *call = dyn_cast (call_stmt))
> +  {
> +function *fun = this->get_function();
> +cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (call);
> +if ((e && e->indirect_info) && (e->indirect_info->polymorphic))
> +unknown_side_effects = false;
> +  }
> +

This seems wrong; surely it depends on what the call is - or am I
missing something?  Is the issue that we're speculating lots of
possibilities as dynamic calls?  If so, would it be better to terminate
the remaining analysis path (if that makes sense), and assume that any
further analysis happens on extra edges added for the speculated calls?

FWIW I've been experimenting with adding "bifurcation" support so that
you can do:
  program_state *other = ctxt->bifurcate ();
and have it split the analysis into states (e.g. for handling realloc,
so that we can split into 3 states: "succeeded", "succeeded but moved",
"failed").  Unfortunately my code for this is a mess (it's a hacked up
prototype).  Should I try to post what I have for this?


[...snip...]


> @@ -3327,9 +3338,44 @@ exploded_graph::process_node (exploded_node *node)
>  region_model *model = state.m_region_model;
>  
>  /* Call is possibly happening via a function pointer.  */
> -if (tree fn_decl = model->get_fndecl_for_call(call,&ctxt))
> -  create_dynamic_call (call, fn_decl, node, next_state,
> -   next_point, &uncertainty);
> +if (tree fn_decl = model->get_fndecl_for_call (call,&ctxt))
> +  create_dynamic_call (call,
> +fn_decl,
> +node,
> +next_state,
> +   next_point,
> +   &uncertainty);
> +else
> +  {
> +/* Call is possibly a polymor

Re: daily report on extending static analyzer project [GSoC]

2021-07-30 Thread David Malcolm via Gcc
On Fri, 2021-07-30 at 18:11 +0530, Ankur Saini wrote:
> 
> 
> > On 30-Jul-2021, at 5:35 AM, David Malcolm 
> > wrote:
> > 
> > On Thu, 2021-07-29 at 18:20 +0530, Ankur Saini wrote:

[..snip...]
> > > 
> 
> > 
> > > @@ -1242,6 +1243,17 @@ exploded_node::on_stmt (exploded_graph &eg,
> > > unknown_side_effects = false;
> > >     }
> > > 
> > > +  /* If the statmement is a polymorphic call then assume 
> > > + there are no side effects.  */
> > > +  gimple *call_stmt = const_cast(stmt);
> > > +  if (gcall *call = dyn_cast (call_stmt))
> > > +  {
> > > +    function *fun = this->get_function();
> > > +    cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge
> > > (call);
> > > +    if ((e && e->indirect_info) && (e->indirect_info-
> > > >polymorphic))
> > > +    unknown_side_effects = false;
> > > +  }
> > > +
> > 
> > This seems wrong; surely it depends on what the call is - or am I
> > missing something?  Is the issue that we're speculating lots of
> > possibilities as dynamic calls?  If so, would it be better to
> > terminate
> > the remaining analysis path (if that makes sense), and assume that
> > any
> > further analysis happens on extra edges added for the speculated
> > calls?
> 
> Actually the issue here was, the analyzer was not able to find the body
> of callee function here and was treating all polymorphic calls as “call
> to unknown functions” and resetting all of the state machines and not
> generating desired diagnostics. I just changed it to assume there is no
> side effect of the call right now. 
> should I maybe eventually check for the unknown call later ( when the
> anayzer knows which function it is calling ) ? 

I'm not sure.  I think it at least warrants a "FIXME" style comment in
the code above.  It also suggests some more test cases, to cover the
calls does have side effects vs call doesn't have side effects: e.g. a
case where, say, A:foo() modifies a global variable, and B::foo()
doesn't; and something like:
  int saved_g = g;
  a->foo ();
  __analyzer_eval (saved_g == g);
could be used in DejaGnu to see if we know what got called.


> 
> > 
> > FWIW I've been experimenting with adding "bifurcation" support so
> > that
> > you can do:
> >  program_state *other = ctxt->bifurcate ();
> > and have it split the analysis into states (e.g. for handling
> > realloc,
> > so that we can split into 3 states: "succeeded", "succeeded but
> > moved",
> > "failed").  Unfortunately my code for this is a mess (it's a hacked
> > up
> > prototype).  Should I try to post what I have for this?
> 
> This surely looks like a thing which the project can take advantage of,
> maybe by bifurcating the analysis at virtual function calls. 

(nods)

I have some non-analyzer tasks to focus on today, so I'm not going to
have that code ready until next week.

[..snip...]

> > 
> > If we're speculating that a particular call happens, do we gain
> > information about what kind of object we're dealing with?
> 
> I think I used the wrong wording there, we are not quite "speculating”
> the calls, but are calling and analysing them instead.
> 
> > 
> > If we have a repeated call to the same vfunc, do we know that we're
> > calling the same function?
> 
> yes, this is a problem I didn’t foresee. When a call can have multiple
> targets, the analyser acts as if all of those functions are called from
> that point and then on second function call, it will do the same,
> leading to some weird diagnostic results. Looks like this is where the
> "bifurcation” might come in handy
> 
> But for a lot of simple cases, apparently the
> possible_polymorphic_call_targets () do the job good enough to only
> give out a single target accurately ( even when a base class pointer is
> used )
> 
> > 
> > [...snip...]
> > 
> > I'm interested in seeing what test cases you have.
> 
> I was testing it on a couple simple test programs ( which I think are
> not enough now ). 
> - - - 
> 1.  https://godbolt.org/z/qboq35bar  (
> analyser doesn’t generate any warnings as this one was just to check if
> the analyser is detecting calls correctly or not  )
> 
> super graph and exploded graph :-

FWIW some of these examples are hitting the complexity limit in
eg_traits::dump_args_t::show_enode_details_p and thus not showing the
details, (which is probably a good thing; we don't want to be sending
huge attachments to the list).  Unfortunately, beyond a certain point,
the .dot dumps get unreadable.

Looking at test_1.cpp.eg.dot I see that after the constructor runs, the
state has e.g.:

  cluster for A a: (&constexpr int (* A::_ZTV1A [4])(...)+(sizetype)16)

which I think means that we "know" that the object vtable is the vtable
for A i.e. that this object is an A:

  $ echo "_ZTV1A" | c++filt
  vtable for A

The cluster dump is simplified if the value is for the whole object,
which is happening here, so the dump might be clearer with an example
that adds some member data in the ctor.  For that case, in the s

Re: daily report on extending static analyzer project [GSoC]

2021-08-04 Thread David Malcolm via Gcc
On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:

[...snip...]
> 
> - From observation, a typical vfunc call that isn't devirtualised by
> the compiler's front end looks something like this 
> "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
> where "a_ptr_5(D)" is pointer that is being used to call the virtual
> function.
> 
> - We can access it's region to see what is the type of the object the
> pointer is actually pointing to.
> 
> - This is then used to find a call with DECL_CONTEXT of the object
> from the all the possible targets of that polymorphic call.

[...]

> 
> Patch file ( prototype ) : 
> 

> +  /* Call is possibly a polymorphic call.
> +  
> + In such case, use devirtisation tools to find 
> + possible callees of this function call.  */
> +  
> +  function *fun = get_current_function ();
> +  gcall *stmt  = const_cast (call);
> +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
> +  if (e->indirect_info->polymorphic)
> +  {
> +void *cache_token;
> +bool final;
> +vec  targets
> +  = possible_polymorphic_call_targets (e, &final, &cache_token, true);
> +if (!targets.is_empty ())
> +  {
> +tree most_propbable_taget = NULL_TREE;
> +if(targets.length () == 1)
> + return targets[0]->decl;
> +
> +/* From the current state, check which subclass the pointer that 
> +   is being used to this polymorphic call points to, and use to
> +   filter out correct function call.  */
> +tree t_val = gimple_call_arg (call, 0);

Maybe rename to "this_expr"?


> +const svalue *sval = get_rvalue (t_val, ctxt);

and "this_sval"?

...assuming that that's what the value is.

Probably should reject the case where there are zero arguments.


> +
> +const region *reg
> +  = [&]()->const region *
> +  {
> +switch (sval->get_kind ())
> +  {
> +case SK_INITIAL:
> +  {
> +const initial_svalue *initial_sval
> +  = sval->dyn_cast_initial_svalue ();
> +return initial_sval->get_region ();
> +  }
> +  break;
> +case SK_REGION:
> +  {
> +const region_svalue *region_sval 
> +  = sval->dyn_cast_region_svalue ();
> +return region_sval->get_pointee ();
> +  }
> +  break;
> +
> +default:
> +  return NULL;
> +  }
> +  } ();
 
I think the above should probably be a subroutine.

That said, it's not clear to me what it's doing, or that this is correct.

I'm guessing that you need to see if
  *((void **)this)
is a vtable pointer (or something like that), and, if so, which class
it is for.

Is there a way of getting the vtable pointer as an svalue?

> +gcc_assert (reg);
> +
> +tree known_possible_subclass_type;
> +known_possible_subclass_type = reg->get_type ();
> +if (reg->get_kind () == RK_FIELD)
> +  {
> + const field_region* field_reg = reg->dyn_cast_field_region ();
> + known_possible_subclass_type 
> +   = DECL_CONTEXT (field_reg->get_field ());
> +  }
> +
> +for (cgraph_node *x : targets)
> +  {
> +if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
> +  most_propbable_taget = x->decl;
> +  }
> +return most_propbable_taget;
> +  }
> +   }
> +
>return NULL_TREE;
>  }

Dave




Re: Noob question about simple customization of GCC.

2021-08-05 Thread David Malcolm via Gcc
On Wed, 2021-08-04 at 00:17 -0700, Alacaster Soi via Gcc wrote:
> How hard would it be to add a tree-like structure and
> headers/sections to
> the -v gcc option so you can see the call structure. Would this be a
> reasonable first contribution/customization for a noob? It'll be a
> while
> before I can reasonably work on this.
> GCC
> version
> config
> >  cc1 main.c
>   | cc1 config and
>   | output
> -> tempfile.s
>     '*extra space' *between each
> lowest
> level command
> >  as -v
>   | output
> -> tempfile.o
> 
> >  collect2.exe
>   | output
>   |- ld.exe
>  | output
> -> tempfile.exe
> 

I really like this UI idea, but I don't know how easy/hard it would be
to implement.  The code that implements figuring out what to invoke
(the "driver") is in gcc/gcc.c, which is a big source file.

FWIW there's also code in gcc/tree-diagnostic-path.cc to emit ASCII art
that does something a bit similar to your idea, which might be worth
looking at (in this case, to visualize function calls and returns along
a code path).

Hope this is helpful
Dave



Re: daily report on extending static analyzer project [GSoC]

2021-08-05 Thread David Malcolm via Gcc
On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote:
> 
> 
> > On 05-Aug-2021, at 4:56 AM, David Malcolm 
> > wrote:
> > 
> > On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
> > 
> > [...snip...]
> > > 
> > > - From observation, a typical vfunc call that isn't devirtualised
> > > by
> > > the compiler's front end looks something like this 
> > > "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
> > > where "a_ptr_5(D)" is pointer that is being used to call the
> > > virtual
> > > function.
> > > 
> > > - We can access it's region to see what is the type of the object
> > > the
> > > pointer is actually pointing to.
> > > 
> > > - This is then used to find a call with DECL_CONTEXT of the object
> > > from the all the possible targets of that polymorphic call.
> > 
> > [...]
> > 
> > > 
> > > Patch file ( prototype ) : 
> > > 
> > 
> > > +  /* Call is possibly a polymorphic call.
> > > +  
> > > + In such case, use devirtisation tools to find 
> > > + possible callees of this function call.  */
> > > +  
> > > +  function *fun = get_current_function ();
> > > +  gcall *stmt  = const_cast (call);
> > > +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
> > > +  if (e->indirect_info->polymorphic)
> > > +  {
> > > +    void *cache_token;
> > > +    bool final;
> > > +    vec  targets
> > > +  = possible_polymorphic_call_targets (e, &final,
> > > &cache_token, true);
> > > +    if (!targets.is_empty ())
> > > +  {
> > > +    tree most_propbable_taget = NULL_TREE;
> > > +    if(targets.length () == 1)
> > > +   return targets[0]->decl;
> > > +    
> > > +    /* From the current state, check which subclass the
> > > pointer that 
> > > +   is being used to this polymorphic call points to, and
> > > use to
> > > +   filter out correct function call.  */
> > > +    tree t_val = gimple_call_arg (call, 0);
> > 
> > Maybe rename to "this_expr"?
> > 
> > 
> > > +    const svalue *sval = get_rvalue (t_val, ctxt);
> > 
> > and "this_sval"?
> 
> ok
> 
> > 
> > ...assuming that that's what the value is.
> > 
> > Probably should reject the case where there are zero arguments.
> 
> Ideally it should always have one argument representing the pointer
> used to call the function. 
> 
> for example, if the function is called like this : -
> 
> a_ptr->foo(arg);  // where foo() is a virtual function and a_ptr is a
> pointer to an object of a subclass.
> 
> I saw that it’s GIMPLE representation is as follows : -
> 
> OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);
> 
> > 
> > 
> > > +
> > > +    const region *reg
> > > +  = [&]()->const region *
> > > +  {
> > > +    switch (sval->get_kind ())
> > > +  {
> > > +    case SK_INITIAL:
> > > +  {
> > > +    const initial_svalue *initial_sval
> > > +  = sval->dyn_cast_initial_svalue ();
> > > +    return initial_sval->get_region ();
> > > +  }
> > > +  break;
> > > +    case SK_REGION:
> > > +  {
> > > +    const region_svalue *region_sval 
> > > +  = sval->dyn_cast_region_svalue ();
> > > +    return region_sval->get_pointee ();
> > > +  }
> > > +  break;
> > > +
> > > +    default:
> > > +  return NULL;
> > > +  }
> > > +  } ();
> > 
> > I think the above should probably be a subroutine.
> > 
> > That said, it's not clear to me what it's doing, or that this is
> > correct.
> 
> 
> Sorry, I think I should have explained it earlier.
> 
> Let's take an example code snippet :- 
> 
> Derived d;
> Base *base_ptr;
> base_ptr = &d;
> base_ptr->foo();// where foo() is a virtual function
> 
> This genertes the following GIMPLE dump :- 
> 
> Derived::Derived (&d);
> base_ptr_6 = &d.D.3779;
> _1 = base_ptr_6->_vptr.Base;
> _2 = _1 + 8;
> _3 = *_2;
> OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);

I did a bit of playing with this example, and tried adding:

1876case OBJ_TYPE_REF:
1877  gcc_unreachable ();
1878  break;

to region_model::get_rvalue_1, and running cc1plus under the debugger.

The debugger hits the "gcc_unreachable ();", at this stmt:

 OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5);

Looking at the region_model with region_model::debug() shows:

(gdb) call debug()
stack depth: 1
  frame (index 0): frame: ‘test’@1
clusters within frame: ‘test’@1
  cluster for: Derived d
key:   {bytes 0-7}
value: ‘int (*) () *’ {(&constexpr int (* Derived::_ZTV7Derived 
[3])(...)+(sizetype)16)}
  cluster for: base_ptr_5: &Derived d.
  cluster for: _2: &‘foo’
m_called_unknown_fn: FALSE
constraint_manager:
  equiv classes:
ec0: {&Derived d.}
ec1: {&

Re: Analyzer tests fail on windows

2021-08-23 Thread David Malcolm via Gcc
On Mon, 2021-08-23 at 09:52 -1000, NightStrike wrote:
> David,
> 
> Many of the analyzer tests fail on windows because they hardcode in
> the
> typedef of size_t to be unsigned long. This is not a platform
> independent
> definition, though, and is wrong for 64 bit windows. This causes
> extra
> warnings that all of the functions using size_t arguments are wrong,
> because they need to be unsigned long long.
> 
> Is their either 1) a built in type you can use,  like __SIZE_T__ if
> that's
> such a thing,  or 2) can you just include stddef.h instead of
> manually
> putting the typedef at the top of each test?

Which tests are failing, specifically?

In many analyzer tests I'm using __SIZE_TYPE__ or stddef.h, however
I've recently added various tests reduced from the Linux kernel on
x86_64 which use unsigned long - maybe I need to rethink those.

Thanks
Dave




Re: Analyzer tests fail on windows

2021-08-24 Thread David Malcolm via Gcc
On Mon, 2021-08-23 at 22:36 -0400, NightStrike wrote:
> On Mon, Aug 23, 2021 at 8:16 PM NightStrike 
> wrote:
> > On Mon, Aug 23, 2021 at 4:09 PM David Malcolm 
> > wrote:
> > > Which tests are failing, specifically?
> 
> Here's the full list of all 37 failures that fail for any reason:
> 
> FAIL: gcc.dg/analyzer/dot-output.c dg-check-dot dot-output.c.state-
> purge.dot
> FAIL: gcc.dg/analyzer/malloc-callbacks.c (test for excess errors)
> FAIL: gcc.dg/analyzer/pr98969.c  (test for warnings, line 17)
> FAIL: gcc.dg/analyzer/pr98969.c (test for excess errors)
> FAIL: gcc.dg/analyzer/pr99716-1.c  (test for warnings, line 25)
> FAIL: gcc.dg/analyzer/pr99716-2.c (test for excess errors)
> FAIL: gcc.dg/analyzer/pr99774-1.c (test for excess errors)
> FAIL: gcc.dg/analyzer/sensitive-1.c  (test for warnings, line 16)
> FAIL: gcc.dg/analyzer/sensitive-1.c warning (test for warnings, line
> 17)
> FAIL: gcc.dg/analyzer/sensitive-1.c event (test for warnings, line
> 17)
> FAIL: gcc.dg/analyzer/sensitive-1.c  (test for warnings, line 23)
> FAIL: gcc.dg/analyzer/sensitive-1.c  (test for warnings, line 24)
> FAIL: gcc.dg/analyzer/sensitive-1.c event (test for warnings, line
> 24)
> FAIL: gcc.dg/analyzer/sensitive-1.c  (test for warnings, line 30)
> FAIL: gcc.dg/analyzer/sensitive-1.c warning (test for warnings, line
> 31)
> FAIL: gcc.dg/analyzer/sensitive-1.c event (test for warnings, line
> 31)
> FAIL: gcc.dg/analyzer/sensitive-1.c  (test for warnings, line 44)
> FAIL: gcc.dg/analyzer/sensitive-1.c  (test for warnings, line 50)
> FAIL: gcc.dg/analyzer/sensitive-1.c  (test for warnings, line 55)
> FAIL: gcc.dg/analyzer/sensitive-1.c  (test for warnings, line 60)
> FAIL: gcc.dg/analyzer/sensitive-1.c  (test for warnings, line 61)
> FAIL: gcc.dg/analyzer/signal-1.c  (test for warnings, line 13)
> FAIL: gcc.dg/analyzer/signal-1.c  (test for warnings, line 25)
> FAIL: gcc.dg/analyzer/signal-2.c  (test for warnings, line 16)
> FAIL: gcc.dg/analyzer/signal-2.c  (test for warnings, line 28)
> FAIL: gcc.dg/analyzer/signal-3.c  (test for warnings, line 10)
> FAIL: gcc.dg/analyzer/signal-3.c  (test for warnings, line 21)
> FAIL: gcc.dg/analyzer/signal-4a.c  (test for warnings, line 15)
> FAIL: gcc.dg/analyzer/signal-4a.c expected multiline pattern lines
> 33-75 not found: 
> FAIL: gcc.dg/analyzer/signal-6.c  (test for warnings, line 11)
> FAIL: gcc.dg/analyzer/signal-6.c  (test for warnings, line 16)
> FAIL: gcc.dg/analyzer/signal-registration-loc.c  (test for warnings,
> line 15)
> FAIL: gcc.dg/analyzer/signal-registration-loc.c  (test for warnings,
> line 21)
> FAIL: gcc.dg/analyzer/strndup-1.c (test for excess errors)
> FAIL: gcc.dg/analyzer/zlib-5.c (test for excess errors)
> 
> Of those, here is what I diagnosed so far:
> pr98969.c:9:19: warning: cast to pointer from integer of different
> size [-Wint-to-pointer-cast]
> * This fails because the function arguments are "long int", and that
> tries to hold a pointer.  It should be uintptr_t or similar.
> 
> pr98969.c:17:3: warning: double-'free' of '*((struct foo *)(long long
> int)i).expr' [CWE-415] [-Wanalyzer-double-free]
> * My guess is that the regex is not right for running under wine,
> because that shouldn't be an excess error.
> 
> pr99716-2.c:13:30: warning: implicit declaration of function
> 'random';
> did you mean 'rand'? [-Wimplicit-function-declaration]
> * The warning is probably right here.  The C function is rand().
> Where does random() come from?
> 
> pr99774-1.c:12:14: warning: conflicting types for built-in function
> 'calloc'; expected 'void *(long long unsigned int,  long long
> unsigned
> int)' [-Wbuiltin-declaration-mismatch]
> * size_t issue
> 
> strndup-1.c:9:13: warning: incompatible implicit declaration of
> built-in function 'strndup' [-Wbuiltin-declaration-mismatch]
> * This function doesn't exist on windows.  So, either we add it to
> libmingwex if it isn't already there and then link that library in to
> the test, or just mark it as unsupported.  I'd probably prefer the
> former, but it's not up to me.
> 
> zlib-5.c:10:15: warning: conflicting types for built-in function
> 'strlen'; expected 'long long unsigned int(const char *)'
> [-Wbuiltin-declaration-mismatch]
> zlib-5.c:16:14: warning: conflicting types for built-in function
> 'calloc'; expected 'void *(long long unsigned int,  long long
> unsigned
> int)' [-Wbuiltin-declaration-mismatch]
> * size_t issue
> 
> gcc.dg/analyzer/gzio-3.c:
> gcc.dg/analyzer/gzio-3a.c:
> * For some reason, these work.  Maybe fread() isn't a builtin? Maybe
> there's a way to make gcc emit a warning when fread() is redefined
> differently.

Thanks for working through the above.

Do you have an account in GCC's bugzilla?  If so, please can you turn
this into a bug report there.  Is there a recipe for testing this via
wine?  (it's been almost 20 years since I did any Windows coding...)

Dave




Re: Error when accessing git read-only archive

2021-09-15 Thread David Malcolm via Gcc
On Mon, 2021-09-13 at 14:03 +0100, Jonathan Wakely via Gcc wrote:
> On Mon, 13 Sept 2021 at 14:01, Jonathan Wakely 
> wrote:
> > 
> > On Mon, 13 Sept 2021 at 13:53, Thomas Koenig via Gcc <
> > gcc@gcc.gnu.org> wrote:
> > > 
> > > Hi,
> > > 
> > > I just got an error when accessing the gcc git pages at
> > > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git , it is:
> > > 
> > > This page contains the following errors:
> > > error on line 91 at column 6: XML declaration allowed only at the
> > > start
> > > of the document
> > > Below is a rendering of the page up to the first error.
> > 
> > The web server seems to restart the page in the middle of the HTML,
> > the content contains:
> > 
> > 
> > 
> > Content-type: text/html
> > 
> > 
> >  > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
> > http://www.w3.org/1999/xhtml"; xml:lang="en-US" lang="en-
> > US">
> 
> Ah, the "second" page it's trying to display (in the middle of the
> first) is an error:
> 
> 
> 
> 500 - Internal Server Error
> 
> 
> Wide character in subroutine entry at /var/www/git/gitweb.cgi line
> 2208.
> 
> 

Summarizing some notes from IRC:

The last commit it manages to print successfully in that log seems to
be:
  c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
so it appears that:
  42e95a830ab48e59389065ce79a013a519646f1
is triggering the issue, and indeed
  
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f42e95a830ab48e59389065ce79a013a519646f1
fails in a similar way, whereas other commits work.

It appears to be due to the "ł" character in the email address of the
Author, in that:

commit c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
Author: Jan-Benedict Glaw 

works, whereas:

commit f42e95a830ab48e59389065ce79a013a519646f1
Author: Jan-Benedict Glaw 

doesn't.

git show f42e95a830ab48e59389065ce79a013a519646f1 | hexdump -C

shows:

0030  41 75 74 68 6f 72 3a 20  4a 61 6e 2d 42 65 6e 65  |Author: Jan-Bene|
0040  64 69 63 74 20 47 6c 61  77 20 3c 6a 62 67 6c 61  |dict Glaw .D|
0060  61 74 65 3a 20 20 20 4d  6f 6e 20 53 65 70 20 31  |ate:   Mon Sep 1|

i.e. we have the two bytes 0xc5 0x82, which is the UTF-8 encoding of "ł".


$ git format-patch 
c012297c9d5dfb177adf1423bdd05e5f4b87e5ec^^..c012297c9d5dfb177adf1423bdd05e5f4b87e5ec
0001-Fix-multi-statment-macro.patch
0002-cr16-elf-is-now-obsoleted.patch
$ file *.patch
0001-Fix-multi-statment-macro.patch:  unified diff output, UTF-8 Unicode text
0002-cr16-elf-is-now-obsoleted.patch: unified diff output, ASCII text


Hope this is helpful
Dave



Re: Extracting function name from the gimple call statement

2021-10-10 Thread David Malcolm via Gcc
On Sun, 2021-10-10 at 23:04 +0530, Shubham Narlawar via Gcc wrote:
> Hello,
> 
> Is there a direct way to print the name of the function call in gimple
> call
> statement?
> 
> For example -
> 
> void bar() {
> a = foo();    //gimple* stmt
> }
> 
> I want to print "foo" from the above gimple*.
> 
> I traced debug_gimple_stmt(gimple*) but it seems complex to just print
> "foo".

Bear in mind that not every gimple call is calling a specific function;
it could be a jump through a function pointer.

  tree fn_ptr = gimple_call_fn (call);

However, for simple cases like the above, fn_ptr will be an ADDR_EXPR
node, and the zeroth operand of the ADDR_EXPR node will get you the
fndecl (of "foo").
  tree fn_decl = TREE_OPERAND (fn_ptr, 0);

Given a decl, you can then use:
  tree identifier = DECL_NAME (fn_decl);
to get the identifier node for the decl ("foo").

Finally, you can use
  const char *str = IDENTIFIER_POINTER (identifier)
to get a 0-terminated string from the identifier that you can print.

Hope this is helpful
Dave



Re: C-family selftests in language-independant source files

2021-11-05 Thread David Malcolm via Gcc
On Fri, 2021-11-05 at 10:38 +0100, cohenarthur.dev via Gcc wrote:
> Hi everyone,
> 
> We have been trying to enable the use of selftests for the rust
> frontend
> over at gccrs. While doing this, I have realized that a few tests from
> language-independant source files such as `opt-problem.c` and
> `diagnostic.c` actually rely on the compiler being a C one.
> 
> For example, one test asserts multiple time that a dumped text actually
> contains the "int" keyword for type assertions, which is never present
> in gccrs's error messages.
> 
> In order to enable the selftests, I have added the following line to
> our
> rust/Make-lang.in, amongs others:
> 
> RUST_SELFTEST_FLAGS = -xr $(SELFTEST_FLAGS)
> 
> Passing -xc instead enables the opt-problem and diagnostic tests to
> pass, but causes our tests to not run. Passing -xrs causes our tests to
> run, but the opt-problem and diagnostic selftests to fail.
> 
> Any idea as to how to disable those tests? Or make it so that they are
> only ran when running C/C++ selftests?

If a selftest should only be run for a given language, there's a
langhook called by selftest::run_tests:

  /* Run any lang-specific selftests.  */
  lang_hooks.run_lang_selftests ();

which e.g. the C frontend implements in gcc/c/c-lang.c as:

#if CHECKING_P
#undef LANG_HOOKS_RUN_LANG_SELFTESTS
#define LANG_HOOKS_RUN_LANG_SELFTESTS selftest::run_c_tests
#endif /* #if CHECKING_P */

which currently merely calls
  c_family_tests ();
which is defined in gcc/c-family/c-common.c

So the invocation of c-family-specific tests could be moved to there
(or to the appropriate locations for C/C++ tests), splitting things up
as appropriate based on how much of each file's selftest suite is lang-
specific.


If it's just one assert within a larger selftest that's problematic,
maybe we can conditionalize it individually, though I'm not sure of a
good way to do that off the top of my head.


Hope this is helpful
Dave



Re: odd internal failure

2021-12-02 Thread David Malcolm via Gcc
On Thu, 2021-12-02 at 12:40 +0100, Richard Biener via Gcc wrote:
> On Wed, Dec 1, 2021 at 9:56 PM Gary Oblock 
> wrote:
> > 
> > Richard,
> > 
> > I rebuilt at "-O0" and that particular call now works but on a call
> > to
> > the same function with a different offset it fails. 😱
> 
> use a debugger to see why

In case you haven't seen them, I put together some tips on debugging
GCC here:
https://dmalcolm.fedorapeople.org/gcc/newbies-guide/debugging.html
https://github.com/davidmalcolm/gcc-newbies-guide/blob/master/debugging.rst

Inserting print statements only gets you so far; at some point you
really need a debugger.

Dave

> 
> > Thanks,
> > 
> > Gary
> > 
> > 
> > 
> > From: Richard Biener 
> > Sent: Wednesday, December 1, 2021 1:09 AM
> > To: Gary Oblock 
> > Cc: gcc@gcc.gnu.org 
> > Subject: Re: odd internal failure
> > 
> > [EXTERNAL EMAIL NOTICE: This email originated from an external
> > sender. Please be mindful of safe email handling and proprietary
> > information protection practices.]
> > 
> > 
> > On Wed, Dec 1, 2021 at 8:46 AM Gary Oblock via Gcc 
> > wrote:
> > > 
> > > What is happening should be trivial to determine but for some
> > > reason it's
> > > not. I'd normally bounce this off a coworker but given the pandemic
> > > and modern dispersed hiring practices it's not even remotely
> > > possible.
> > > 
> > > I'm making this call and tree_to_uhwi is failing on an internal
> > > error.
> > > That's normally easy to fix, but here is where the weirdness kicks
> > > in.
> > > 
> > >   unsigned HOST_WIDE_INT wi_offset = tree_to_uhwi (offset);
> > > 
> > > tree_to_uhwi from tree.h is:
> > > 
> > > extern inline __attribute__ ((__gnu_inline__)) unsigned
> > > HOST_WIDE_INT
> > > tree_to_uhwi (const_tree t)
> > > {
> > >   gcc_assert (tree_fits_uhwi_p (t));
> > >   return TREE_INT_CST_LOW (t);
> > > }
> > > 
> > > and
> > > 
> > > tree_fits_uhwi_p from tree.c is
> > > 
> > > bool
> > > tree_fits_uhwi_p (const_tree t)
> > > {
> > >   return (t != NULL_TREE
> > >  && TREE_CODE (t) == INTEGER_CST
> > >  && wi::fits_uhwi_p (wi::to_widest (t)));
> > > }
> > > 
> > > Here's what this instrumentation shows (DEBUG_A is an indenting
> > > fprintf to
> > > stderr.)
> > > 
> > >   DEBUG_A ("TREE_CODE(offset) = %s  && ", code_str (TREE_CODE
> > > (offset)));
> > >   DEBUG_A ("fits %s\n", wi::fits_uhwi_p (wi::to_widest (offset)) ?
> > > "true" : "false");
> > >   DEBUG_A ("tree_fits_uhwi_p(offset) %s\n",tree_fits_uhwi_p
> > > (offset) ? "true" : "false");
> > > 
> > >    TREE_CODE(offset) = INTEGER_CST  && fits true
> > >    tree_fits_uhwi_p(offset) true
> > > 
> > > By the way, offset is:
> > > 
> > > _Literal (struct BASKET * *) 8
> > > 
> > > And it's an operand of:
> > > 
> > > MEM[(struct BASKET * *)&perm + 8B]
> > > 
> > > Any clues on what's going on here?
> > 
> > it should just work.
> > 
> > > Thanks,
> > > 
> > > Gary
> > > 
> > 
> > Btw, try to setup things so you don't spam below stuff to public
> > mailing lists.
> > 
> > > CONFIDENTIALITY NOTICE: This e-mail message, including any
> > > attachments, is for the sole use of the intended recipient(s) and
> > > contains information that is confidential and proprietary to Ampere
> > > Computing or its subsidiaries. It is to be used solely for the
> > > purpose of furthering the parties' business relationship. Any
> > > unauthorized review, copying, or distribution of this email (or any
> > > attachments thereto) is strictly prohibited. If you are not the
> > > intended recipient, please contact the sender immediately and
> > > permanently delete the original and any copies of this email and
> > > any attachments thereto.
> 




Re: Mass rename of C++ .c files to .cc suffix?

2022-01-07 Thread David Malcolm via Gcc
On Fri, 2022-01-07 at 11:25 +0100, Martin Jambor wrote:
> Hi,
> 
> Would anyone be terribly against mass renaming all *.c files (that are
> actually C++ files) within the gcc subdirectory to ones with .cc
> suffix?
> 
> We already have 47 files with suffix .cc directly in the gcc
> subdirectory and 160 if we also count those in (non-testsuite)
> subdirectories, while the majority of our non-header C++ files still
> has
> the .c suffix.
> 
> I have already missed stuff when grepping because I did not include
> *.cc
> files and the inconsistency is also just ugly and must be very
> confusing
> to anyone who encounters it for the first time.
> 
> Since we have switched to git, this should have quite small effect on
> anyone who does their development on branches.  With Martin Liška we
> did
> a few experiments and git blame, git rebase and even git gcc-backport
> worked seamlessly across a rename.
> 
> I would be fine waiting with it until GCC 12 gets released but see
> little value in doing so.
> 
> What do others think?  (Any important caveats I might have missed?)

+1 from me.

Various details:

Presumably the generated files should also change from .c to .cc (e.g.
gengtype generates a gtype-desc.c which is actually C++).

grep for "files_rules" in gengtype: it seems to have some hardcoded
regex patterns that end in "\\.c" (not sure what these do, but should
be investigated w.r.t. a renaming to .cc)

A minor detail that would be nice to get right: the selftests manually
code the names of source files in function names; see
selftest::run_tests in selftest-run-tests.c, which has lots of calls to
functions of the form "foo_c_tests ();"  These function names should
probably be renamed to "foo_cc_tests" if "foo.c" is renamed to
"foo.cc".  Though perhaps that can wait to a followup, or be a separate
commit, if that helps with backporting.

Hope this is constructive
Dave




Re: GCC GSoC 2022: Call for project ideas and mentors

2022-01-07 Thread David Malcolm via Gcc
On Thu, 2022-01-06 at 17:20 +0100, Martin Jambor wrote:
> Hello,
> 
> another year is upon us and Google has announced there will be again
> Google Summer of Code 2022 (though AFAIK there is no specific timeline
> yet).  I'd like to volunteer to be the main Org Admin for GCC again so
> let me know if you think I shouldn't or that someone else should, but
> otherwise I'll assume that I will.
> 
> There will be a few important changes to the GSoC this year.  The most
> important for us is that there will be two project sizes: medium-sized
> projects which are expected to take about 175 hours to complete and
> large projects expected to take approximately 350 hours (the size from
> 2020 and earlier).  I expect that most of our projects will be large
> but
> I think we can offer one or two medium-sized ideas too.
> 
> Google will also increase timing flexibility, so the projects can run
> for longer (up to 22 weeks) allowing mentors to go on vacation and
> students to pause and focus on exams.  Talking about students, Google
> is
> going to open the program to all adults, so from now on, the
> participants working on the projects will be called GSoC contributors.
> 
> Slightly more information about these changes can be found at
> https://opensource.googleblog.com/2021/11/expanding-google-summer-of-code-in-2022.html
> I am sure we will learn more when the actual timeline is announced too.
> 
>  The most important bit:
> 
> 
> Even before that happens, I would like to ask all (moderately) seasoned
> GCC contributors to consider mentoring a student this year and ideally
> also come up with a project that they would like to lead.  I'm
> collecting proposal on our wiki page
> https://gcc.gnu.org/wiki/SummerOfCode - feel free to add yours to the
> top list there.  Or, if you are unsure, post your offer and project
> idea
> as a reply here to the mailing list.

How did it get to be 2022 already?

Thanks for organizing this.

I'd like to (again) mentor a project relating to the GCC static
analyzer:
  https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer

I've updated the analyzer task ideas on:
  https://gcc.gnu.org/wiki/SummerOfCode
but the ideas there are suggestions; if any prospective candidate has
other good ideas for things worth working on within the analyzer, let
me know.

Alternatively, I'm also up for mentoring relating to diagnostics or
libgccjit, if someone can think of an idea of suitable size and scope
for a GSoC project.

Dave

> 
> ===
> ==
> 
> Eventually, each listed project idea should have a) a project
> title/description, b) more detailed description of the project (2-5
> sentences), c) expected outcomes, d) skills required/preferred, e)
> project size and difficulty and f) expected mentors.
> 
> Project ideas that come without an offer to also mentor them are always
> fun to discuss, by all means feel free to reply to this email with
> yours
> and I will attempt to find a mentor, but please be aware that we can
> only use the suggestion it if we actually find one.
> 
> Everybody in the GCC community is invited to go over
> https://gcc.gnu.org/wiki/SummerOfCode and remove any outdated or
> otherwise bad project suggestions and help improve viable ones.
> 
> Finally, please continue helping (prospective) students figure stuff
> out
> about GCC like you always do.  So far I think all of them enjoyed
> working with us, even if many sometimes struggled with GCC's
> complexity.
> 
> I will update you as more details about GSoC 2022 become available.
> 
> Thank you, let's hope we attract some new talent again this year.
> 
> Martin
> 




Re: Many analyzer failures on non-Linux system (x86_64-apple-darwin)

2022-01-10 Thread David Malcolm via Gcc
On Mon, 2022-01-10 at 17:13 +0100, FX wrote:
> Hi David,
> 
> May I kindly ping you on that? Or anyone with knowledge of the static
> analyzer?

Sorry about the delay in responding; I was on vacation and am still
getting caught up.

Various answers inline below...

> 
> Thanks,
> FX
> 
> 
> > Le 23 déc. 2021 à 22:49, FX  a écrit :
> > 
> > Hi David, hi everone,
> > 
> > I’m trying to understand how best to fix or silence the several
> > failures in gcc.dg/analyzer that occur on x86_64-apple-darwin. Some
> > of them, according to gcc-testresults, also occur on other non-
> > Linux targets. See for example, the test results at  
> > https://gcc.gnu.org/pipermail/gcc-testresults/2021-December/743901.html


> > 
> > ## gcc.dg/analyzer/torture/asm-x86-linux-*.c
> > 
> > Are these supposed to be run only on Linux (as the name implies)?
> > Four of them fail on x86_64-apple-darwin, because they use assembly
> > that is not supported:
> > 
> > FAIL: gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c
> > FAIL: gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c
> > FAIL: gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c
> > FAIL: gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-
> > full.c
> > 
> > Should they be restricted to Linux targets? There is another one
> > that has the same error, as well, although it doesn’t have linux in
> > the name:
> > 
> > FAIL: gcc.dg/analyzer/asm-x86-lp64-1.c

The purpose of these asm tests is to verify that the analyzer doesn't
get confused by various inline assembler directives used in the source
of the Linux kernel.  So in theory they ought to work on any host, with
a gcc configured for a suitable target.

These tests are marked with "dg-do assemble" directives, which I'd
hoped would mean it uses -S for the tests (to make a .s file), but
looking at a log locally, it appears to be using -c (to make a .o
file), so maybe that's what's going wrong for you as well?


> > 
> > 
> > ## Builtin-related failures
> > 
> > Those four cases fail:
> > 
> > gcc.dg/analyzer/data-model-1.c
> > 
> > gcc.dg/analyzer/pr103526.c
> > gcc.dg/analyzer/taint-size-1.c
> > gcc.dg/analyzer/write-to-string-literal-1.c
> > 
> > but pass if the function calls (memset and memcpy) are replaced by
> > the built-in variant (__builtin_memset and __builtin_memcpy). The
> > reason for that is the darwin headers, in 
> > (included from ) does this:
> > 
> > #if __has_builtin(__builtin___memcpy_chk) || defined(__GNUC__)
> > #undef memcpy
> > /* void *memcpy(void *dst, const void *src, size_t n) */
> > #define memcpy(dest, ...) \
> >    __builtin___memcpy_chk (dest, __VA_ARGS__,
> > __darwin_obsz0 (dest))
> > #endif
> > 
> > where __darwin_obsz0 is defined thusly:
> > 
> > #define __darwin_obsz0(object) __builtin_object_size (object, 0)
> > 
> > 
> > Does the analyzer not handle the _chk builtin variants? Should it?
> > I’m happy to investigate more, but I’m not sure what to do.

Can you file a bug about this and attach the preprocessed source from
the test (using -E).

Thanks
Dave



Re: GSoC: Working on the static analyzer

2022-01-11 Thread David Malcolm via Gcc
On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
> Hi everyone,

Hi, and welcome.

> I intend to work on the static analyzer. Are these documents enough to
> get
> started: https://gcc.gnu.org/onlinedocs/gccint and
> https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals

Yes.

There are also some high-level notes here:
  https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer

Also, given that the analyzer is part of GCC, the more general
introductions to hacking on GCC will be useful.

I recommend creating a trivial C source file with a bug in it (e.g. a
3-line function with a use-after-free), and stepping through the
analyzer to get a sense of how it works.

Hope this is helpful; don't hesitate to ask questions.
Dave



Re: Accessing const parameter of GIMPLE_CALL

2022-01-16 Thread David Malcolm via Gcc
On Sun, 2022-01-16 at 18:52 +0530, Shubham Narlawar via Gcc wrote:
> Hello,

Hi; various notes inline below...

> 
> My aim is to iterate over gimple call stmt parameters and check
> whether it is constant or constant expression and mark/store them for
> some gimple transformation.
> 
> I have an intrinsic function call of the following -
> 
> __builtin_xyz(void*, 7, addr + 10);
> 
> I want to find its parameters which are either constant or constant
> expression i.e. 7 and addr + 10 from above case.

Gimple "flattens" all tree-like operations into a sequence of simple
operations, so I would expect the gimple for this to look something
like this:

   _tmp = addr + 10;
   __builtin_xyx (7, _tmp);

Your email doesn't specify *when* your code runs.

The IR for a function goes through several stages:

- an initial gimple IR without a CFG
- gimple with a CFG, but not in SSA
- gimple-SSA with a CFG
  (most of the gimple optimization passes operate in this form of the
IR)
- gimple with a CFG, but no longer in CFG form, immediately before
conversion to RTL-with-CFG form
- RTL-with-CFG
- RTL-without a CFG
- assembler

Are you doing it as part of a plugin, or modifying an existing pass? 
In either case, it's a good idea to dump the gimple and see what the
code has been turned into.  You'll probably find the following options
useful:
  -fdump-tree-all -fdump-gimple-all

or alternatively just turn it on for the pass that you're working on.

> 
> [1] I tried below macro but there is very less usage in the entire
> source code -
> 
> tree fn_ptr = gimple_call_fn (dyn_cast (stmt));    //stmt

gimple_call_fn returns the function that will be called, a pointer. 
This is very general, for handling things like jumps through function
pointers, but here you have the common case of a callsite that calls a
specific function, so "fn_ptr" here is:
   &__builtin_xyx
i.e. an ADDR_EXPR where operand 0 is the FUNCTION_DECL for the builtin.

> = gimple_call
> function_args_iterator iter;
> tree argtype;
> 
> if (TREE_CODE (fn_ptr) == ADDR_EXPR)
> {
>   FOREACH_FUNCTION_ARGS (fn_ptr, argtype, iter)

Looking in tree.h, FOREACH_FUNCTION_ARGS takes a FUNCTION_TYPE as its
first argument, but the code above is passing it the ADDR_EXPR wrapping
the FUNCTION_DECL.

Unfortunately, because these things are all of type "tree", this kind
of type mismatch doesn't get caught - unless you build gcc from source
(with --enable-checking=debug) in which case all these accesses are
checked at the compiler's run time (which is probably a good thing to
do if you're hoping to work on gcc for GSoC).

You can get the FUNCTION_TYPE of a FUNCTION_DECL via TREE_TYPE
(fndecl), or alternatively, gimple_call_fntype (call) will get the type
of the function expected at the call stmt (useful if there was a type
mismatch).

That said, FOREACH_FUNCTION_ARGS iterates through the types of the
params of the FUNCTION_TYPE, but it sounds like you want to be
iterating through the arguments at this particular *callsite*.

For that you can use
  gimple_call_num_args (call);
and
  gimple_call_arg (call, idx);

>     {
>     if (TREE_CONSTANT (argtype))
>    // Found a constant expression parameter
>     }
> }
> 
> The problem is I am getting only one parameter tree but there are 2
> constants in the above function call. Even if "addr + 10" is treated
> differently, I want to mark it for the transformation.

I think you're seeing the function pointer being called, ather than the
params.

> 
> a. Is the above correct method to iterate over function call
> parameters?

As noted above, it depends on whether you want to iterate over the
types of the parameters in the function's decl, or over the expressions
of the arguments at the callsite.  I believe the above explains how to
do each of these.

> b. Is there a different way to achieve the above goal?

If you're looking to get familiar with GCC's insides, I recommend
stepping through it in the debugger, rather than relying on injecting
print statements and recompiling, since the former makes it much easier
to spot mistakes like the one above (which we all make).

I've written a guide to debugging GCC here:

https://dmalcolm.fedorapeople.org/gcc/newbies-guide/debugging.html


Hope this is helpful
Dave



Re: GSoC: Working on the static analyzer

2022-01-16 Thread David Malcolm via Gcc
On Fri, 2022-01-14 at 22:15 +0530, Mir Immad wrote:
> HI David,
> I've been tinkering with the static analyzer for the last few days. I
> find
> the project of adding SARIF output to the analyzer intresting. I'm
> writing
> this to let you know that I'm trying to learn the codebase.
> Thank you.

Excellent.

BTW, I think adding SARIF output would involve working more with GCC's
diagnostics subsystem than with the static analyzer, since (in theory)
all of the static analyzer's output is passing through the diagnostics
subsystem - though the static analyzer is probably the only GCC
component generating diagnostic paths.

I'm happy to mentor such a project as I maintain both subsystems and
SARIF output would benefit both - but it would be rather tangential to
the analyzer - so if you had specifically wanted to be working on the
guts of the analyzer itself, you may want to pick a different
subproject.

The SARIF standard is rather long and complicated, and we would want to
be compatible with clang's implementation.

It would be very cool if gcc could also accept SARIF files as an
*input* format, and emit them as diagnostics; that might help with
debugging SARIF output.   (I have a old patch for adding JSON parsing 
support to GCC that could be used as a starting point for this).

Hope the above makes sense
Dave

> 
> On Tue, Jan 11, 2022, 7:09 PM David Malcolm 
> wrote:
> 
> > On Tue, 2022-01-11 at 11:03 +0530, Mir Immad via Gcc wrote:
> > > Hi everyone,
> > 
> > Hi, and welcome.
> > 
> > > I intend to work on the static analyzer. Are these documents
> > > enough to
> > > get
> > > started: https://gcc.gnu.org/onlinedocs/gccint and
> > > 
> > https://gcc.gnu.org/onlinedocs/gccint/Analyzer-Internals.html#Analyzer-Internals
> > 
> > Yes.
> > 
> > There are also some high-level notes here:
> >   https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
> > 
> > Also, given that the analyzer is part of GCC, the more general
> > introductions to hacking on GCC will be useful.
> > 
> > I recommend creating a trivial C source file with a bug in it (e.g.
> > a
> > 3-line function with a use-after-free), and stepping through the
> > analyzer to get a sense of how it works.
> > 
> > Hope this is helpful; don't hesitate to ask questions.
> > Dave
> > 
> > 




[PATCH] testsuite: avoid analyzer asm failures on non-Linux

2022-01-20 Thread David Malcolm via Gcc
On Sun, 2022-01-16 at 12:11 +0100, FX wrote:
> > No, that's "dg-do compile" (as in "compile but don't assemble").
> 
> I can confirm that this patch:
> 
> diff --git a/gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-1.c
> b/gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-1.c
> index c235e22fd01..4730255bb3c 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-1.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do assemble { target x86_64-*-* } } */
> +/* { dg-do compile { target x86_64-*-* } } */
>  /* { dg-require-effective-target lp64 } */
>  
>  #include "analyzer-decls.h”
> 
> 
> fixes the gcc.dg/analyzer/asm-x86-lp64-1.c failure on
> x86_64-apple-darwin. The same is true of this one:
> 
> diff --git
> a/gcc/testsuite/gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c
> b/gcc/testsuite/gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c
> index e90dccf58dd..4cbf43206dc 100644
> ---
> a/gcc/testsuite/gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c
> +++
> b/gcc/testsuite/gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c
> @@ -1,4 +1,4 @@
> -/* { dg-do assemble { target x86_64-*-* } } */
> +/* { dg-do compile { target x86_64-*-* } } */
>  /* { dg-require-effective-target lp64 } */
>  /* { dg-additional-options "-fsanitize=bounds
> -fno-analyzer-call-summaries" } */
>  /* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
> 
> 
> 
> These other three:
> FAIL: gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c
> FAIL: gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c
> FAIL: gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c
> 
> still fail with dg-do compile, as explained, become the error comes
> from the C front-end, not the assembler:
> 
> /Users/fx/gcc/gcc/testsuite/gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c:27:3:
> warning: 'asm' operand 6 probably does not match constraints
> /Users/fx/gcc/gcc/testsuite/gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c:27:3:
> error: impossible constraint in 'asm'

Thanks.  I extended your patch as follows, which works successfully for
me on x86_64-pc-linux-gnu.

Does the following look OK for the analyzer asm failures on
x86_64-apple-darwin?

Dave

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/asm-x86-1.c: Use dg-do "compile" rather than
"assemble".
* gcc.dg/analyzer/asm-x86-lp64-1.c: Likewise.
* gcc.dg/analyzer/asm-x86-lp64-2.c: Likewise.
* gcc.dg/analyzer/torture/asm-x86-linux-array_index_mask_nospec.c:
Likewise.
* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c:
Likewise, and restrict to x86_64-pc-linux-gnu.
* gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c: Likewise.
* gcc.dg/analyzer/torture/asm-x86-linux-cpuid.c: Use dg-do
"compile" rather than "assemble".
* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c:
Likewise, and restrict to x86_64-pc-linux-gnu.
* gcc.dg/analyzer/torture/asm-x86-linux-rdmsr.c: Use dg-do
"compile" rather than "assemble".
* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c:
Likewise.
* gcc.dg/analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-reduced.c:
Likewise.

Signed-off-by: David Malcolm 
---
 gcc/testsuite/gcc.dg/analyzer/asm-x86-1.c   | 2 +-
 gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-1.c  | 2 +-
 gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-2.c  | 2 +-
 .../analyzer/torture/asm-x86-linux-array_index_mask_nospec.c| 2 +-
 .../gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-1.c| 2 +-
 .../gcc.dg/analyzer/torture/asm-x86-linux-cpuid-paravirt-2.c| 2 +-
 gcc/testsuite/gcc.dg/analyzer/torture/asm-x86-linux-cpuid.c | 2 +-
 .../gcc.dg/analyzer/torture/asm-x86-linux-rdmsr-paravirt.c  | 2 +-
 gcc/testsuite/gcc.dg/analyzer/torture/asm-x86-linux-rdmsr.c | 2 +-
 .../analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-full.c| 2 +-
 .../analyzer/torture/asm-x86-linux-wfx_get_ps_timeout-reduced.c | 2 +-
 11 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/asm-x86-1.c 
b/gcc/testsuite/gcc.dg/analyzer/asm-x86-1.c
index f6026b7e288..a3f86e440b5 100644
--- a/gcc/testsuite/gcc.dg/analyzer/asm-x86-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/asm-x86-1.c
@@ -1,4 +1,4 @@
-/* { dg-do assemble { target x86_64-*-* } } */
+/* { dg-do compile { target x86_64-*-* } } */
 
 #include "analyzer-decls.h"
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-1.c 
b/gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-1.c
index c235e22fd01..4730255bb3c 100644
--- a/gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/asm-x86-lp64-1.c
@@ -1,4 +1,4 @@
-/* { dg-do assemble { target x86_64-*-* } } */
+/* { dg-do compile { target x86_64-*-* } } */
 /* { dg-require-effective-target lp64 } */
 
 #include "analyzer-decls.h"
diff -

  1   2   3   >