[GSoC] Interest in project ideas and request for a small patch

2023-03-15 Thread Benjamin Priour via Gcc
Hi,
I'm Benjamin and I have been looking around the mail list and the code base
since last December, where I had to write a simple Deca compiler within a
month as a team. Next paragraph is me introducing myself -ranting really-
for too long.

I am from France, in my Computer science master's first year, and very much
interested in compilers and operating systems. I was already considering
applying for the GSoC, but felt rather intimidated by the scale of it so I
kind of forgot about it until last week my university housed a gathering
event for companies to promote internships.
Few of them appealed to me, while my teacher cheered for me to try and
reach you out, so here I am.

Hence, even if it's two weeks late, I would like to express my interest in
contributing to the project ideas listed below, and request for small patch
I could work on.

- [Extend the static analysis pass] is the one I feel the most confident
in, but I see it is already quite envied.
- [Enable incremental LTO linking] - OK here it might be a bit far-fetched,
as my only experience about LTO at the moment is mostly consisting of the
two conference videos and the gcc internal doc. However the subject strikes
my as reaally interesting, and no matter the project I will have to
document myself.
- [Rust front-end] - Especially the metadata exports and the user errors
sections. Even though the languages I have the most familiarity with are C
and C++, I got a nascent interest in Rust. I still have to check the doc
way more than I care to admit, but I have a good grasp of the underlying
concepts.

I have experience working on a Python debugger during my previous
internship, as well as the full-time compiler project I did last January.
>From those experiences, reading Crafting Interpreters (R.Nystrom) and my
language theory courses, I'm certain I will enjoy contributing to this.

Oof, finally braced myself to reach you out, sadly almost too late, but I
just got a boulder-sized weight off my shoulders.
Thanks in advance! -  I tried to keep it short.


Re: [GSoC] Discussion of warnings and of a first patch (name-hiding)

2023-03-22 Thread Benjamin Priour via Gcc
Hi Jason,

Sorry for the delayed answer, I was in my exam period!

I've almost finished the patch (PR12341
), and wrote a testcase
to compare it against.
In it I use dejaGNU instructions of the format:
 /* { dg-warning "'C::bbb' might shadow 'B::bbb'." "" { target *-*-* }
C_shadowing_bbb } */
and run the test with:
make check-gcc-c++ RUNTESTFLAGS="-v -v --target_board=unix\{-m32,-m64\}
dg.exp=pr12341-1.C"
I get expected results on most target, except on target hppa*-*-hpux*.
I have been looking around the documentation, yet I cannot find a way to
run on that target specifically, especially considering the wildcards. How
to test GCC on a simulator  don't
specify that target in the bottom table.
Would you have any guidance on how to do so ? I would like to debug the
test on that platform.

Otherwise, I believe it is working just fine, I put it as an enhancement to
-Wshadow. Should I send it to gcc-patches ? I wasn't sure I was supposed to
since it's still buggy, so I preferred to refrain.


>   Up to now I have considered some extra warning

> 1- Warning for NULL smart pointers dereferencing. E.g. when a smart
> > pointer is left uninitialized yet is derefe0renced.
>
> I would expect that to be covered by (enhancements to?) the existing
> -Wuninitialized or -Wmaybe-uninitialized or
> -Wanalyzer-use-of-uninitialized-value.
>

> > 3- Use after move.
>
> Hmm, it might make sense to approach this as an enhancement to the
> uninitialized warnings whereby a move makes us consider the object no
> longer initialized, much like -Wuse-after-free and
> -Wanalyzer-use-after-free.
>
> > 4- Use references or raw pointers as parameters when ownership is not
> > considered. (For general use, take T* or T& arguments rather than smart
> > pointers
> > <
> https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#f7-for-general-use-take-t-or-t-arguments-rather-than-smart-pointers
> >)
>
> This seems feasible to implement in the front-end.
>
>
I will update on these warnings and a few questions on the analyzer
tomorrow, I now got some free time back.
Benjamin


[GSoC] A bunch of questions about sm-malloc behavior, and proposition of a GSoC subject.

2023-03-29 Thread Benjamin Priour via Gcc
Hi David,

I've been playing around with sm-malloc on C++ samples.
I've noticed double delete don't go under -Wanalyzer-double-free but within
-Wanalyzer-use-after-free scope.

With the reproducer ...
> struct A {};
>
> int main() {
> A* a = new A();
> delete a;
> delete a;
> return 0;
> }

... the analyzer diagnoses:
build/gcc/xg++ -Bbuild/gcc -S -fanalyzer
-Wanalyzer-mismatching-deallocation -Wanalyzer-use-after-free
-Wanalyzer-double-free -Wanalyzer-malloc-leak  double_delete_test.cpp
double_delete_test.cpp: In function ‘int main()’:
double_delete_test.cpp:7:1: warning: use after ‘delete’ of ‘a’ [CWE-416]
[-Wanalyzer-use-after-free]
7 | }
  | ^
  ‘int main()’: events 1-8
|
|3 | A* a = new A();
|  |  ^
|  |  |
|  |  (1) allocated here
|  |  (2) assuming ‘operator new(8)’ is non-NULL
|4 | delete a;
|  | 
|  | |  |
|  | |  (4) ...to here
|  | |  (5) deleted here
|  | (3) following ‘true’ branch...
|5 | delete a;
|  | 
|  | |  |
|  | |  (7) ...to here
|  | (6) following ‘true’ branch...
|6 | return 0;
|7 | }
|  | ~
|  | |
|  | (8) use after ‘delete’ of ‘a’; deleted at (5)
|
double_delete_test.cpp:3:18: warning: dereference of possibly-NULL
‘operator new(8)’ [CWE-690] [-Wanalyzer-possible-null-dereference]
3 | A* a = new A();
  |  ^
  ‘int main()’: events 1-2
|
|3 | A* a = new A();
|  |  ^
|  |  |
|  |  (1) this call could return NULL
|  |  (2) ‘operator new(8)’ could be NULL:
unchecked value from (1)
|


Debugging read out two things:
- During second call of 'on_deallocator_call' on delete, the state would be
'stop' instead of the expected 'freed'
- The call to set_next_state transitions malloc instead of delete from
'nonnull' to 'freed'.
I'm still trying to come up with why these two behaviors happens.

By the way, in the first call to 'on_deallocator_call' on delete, the state
is set to 'nonnull',
which conforms to C++ behavior for new. However, a
-Wanalyzer-possible-null-dereference is emitted at the expression 'new'.
I haven't yet figured out why, but I'm looking into it.


Another observation was in the distinction between delete and free in the
case of mixing them.
With 'a' being allocated with malloc:
> A* a = (A*) malloc(sizeof(A));
> free(a);
> delete a; // -Wanalyzer-use-after-free, OK, might expect warning for
double free though ?

but with allocation through new and inversion of free/delete
> A* a = new A();
> delete a;
> free(a); // No diagnostics at all from the analyzer, got
-Wmismatched-new-delete from the front-end though.

I believe this might come from a similar issue as above, but I still have
to investigate on that front.

I just noticed another unexpected behavior. Let's consider
> struct A {int x; int y;};
> void* foo() { A* a = (A*) __builtin_malloc(sizeof(A)); return a; }
> int main() {
> A* a2 = (A*) __builtin_malloc(sizeof(A));
> foo();
> return 0;
> }

Then a -Wanalyzer-malloc-leak is correctly ensued for allocation in foo(),
but none is emitted for the leak in main(), although I checked the exploded
graph it is correctly marked as unchecked(free).

Should I file those on bugzilla ?


Also I have taken interest in PR106388 - Support for use-after-move in
-fanalyzer -.
The prerequisite PR106386 - Reuse libstdc++ assertions - would also be of
great help in extending the support of -Wanalyzer-out-of-bounds,
as we could detect out-of-bounds on vectors through
__glibcxx_requires_subscript used in the definition of operator[].

I already thought about a few ideas to implement that, but I'll develop
them more and try to come up with some proof of concept before sending them
to you.
Hopefully by tomorrow I'll update on this, I'll preferably hop to bed now
haha.
Do you think this could make a suitable GSoC subject ?


Best,
Benjamin.


[GSoC][analyzer-c++] Enable enough C++ support for self-analysis

2023-03-31 Thread Benjamin Priour via Gcc
Hi David,


On Thu, Mar 30, 2023 at 2:04 AM David Malcolm  wrote:

> Note that the analyzer doesn't properly work yet on C++; see:
>   https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=97110
> I'm hoping to address much of this in GCC 14.
>
> Other notes inline below...
>
> I'm guessing that the CFG and thus the supergraph contains edges for
> handling exceptions, and the analyzer currently doesn't know anything
> about these, and mishandled them:
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97111
>
> You might want to have a look at the supergraph (via -fdump-analyzer-
> supergraph); I recently added more debugging notes here:
>  https://gcc.gnu.org/onlinedocs/gccint/Debugging-the-Analyzer.html
>
> If that's the case, then -fno-exceptions might affect the behavior.
>
> Thanks for the additional debugging tips!

> >
> >
> > Another observation was in the distinction between delete and free in
> > the
> > case of mixing them.
> > With 'a' being allocated with malloc:
> > > A* a = (A*) malloc(sizeof(A));
> > > free(a);
> > > delete a; // -Wanalyzer-use-after-free, OK, might expect warning
> > > for
> > double free though ?
> >
> > but with allocation through new and inversion of free/delete
> > > A* a = new A();
> > > delete a;
> > > free(a); // No diagnostics at all from the analyzer, got
> > -Wmismatched-new-delete from the front-end though.
> >
> > I believe this might come from a similar issue as above, but I still
> > have
> > to investigate on that front.
> >
> > I just noticed another unexpected behavior. Let's consider
> > > struct A {int x; int y;};
> > > void* foo() { A* a = (A*) __builtin_malloc(sizeof(A)); return a; }
> > > int main() {
> > > A* a2 = (A*) __builtin_malloc(sizeof(A));
> > > foo();
> > > return 0;
> > > }
> >
> > Then a -Wanalyzer-malloc-leak is correctly ensued for allocation in
> > foo(),
> > but none is emitted for the leak in main(), although I checked the
> > exploded
> > graph it is correctly marked as unchecked(free).
> >
> > Should I file those on bugzilla ?
>
>


> We already know that "C++ doesn't work yet".  Minimal examples of
> problems with C++ support might be worth filing, to isolate specific
> issues.  If you do, please add them to the tracker bug (by adding
> "analyzer-c++" to the "Blocks" field of the new bug(s))
>
> Sure will be doing that.


> >
> >
> > Also I have taken interest in PR106388 - Support for use-after-move
> > in
> > -fanalyzer -.
> > The prerequisite PR106386 - Reuse libstdc++ assertions - would also
> > be of
> > great help in extending the support of -Wanalyzer-out-of-bounds,
> > as we could detect out-of-bounds on vectors through
> > __glibcxx_requires_subscript used in the definition of operator[].
>
> There's a bunch of other C++-enablement work (as per the tracker bug
> above) that I think would need fixing before PR106388 is reasonable to
> tackle.
>

Indeed a bunch of them, the ramification of solving that PR got me too
excited.


> >
> > I already thought about a few ideas to implement that, but I'll
> > develop
> > them more and try to come up with some proof of concept before
> > sending them
> > to you.
> > Hopefully by tomorrow I'll update on this, I'll preferably hop to bed
> > now
> > haha.
> > Do you think this could make a suitable GSoC subject ?
>
> I think working on the C++ enablement prerequisites in the analyzer
> would make more sense.  I'd planned to do this myself for GCC 14, but
> there are plenty of other tasks I could work on if you want to tackle
> C++ support as a GSoC project for GCC 14.
>

Yes, I gladly would.


> A good C++ project might be: enable enough C++ support in the analyzer
> for it to be able to analyze itself.  This could be quite a large,
> difficult project, though it sidesteps having to support exception-
> handling, since we build ourselves with exception-handling disabled.
>
> Hope this is helpful
> Dave
>
>
To that purpose,  the following order of resolutions would make sense into
achieving that:
0. An emphasis on reducing the amount of exploded nodes will have to be put
from the beginning. All of my C++ samples produce graphs quite dense.
1. First thing first, extending the current C tests to cover C++ PR96395

(1.bis )
2. I would then go with supporting the options relative to sm-malloc:
  - -Wanalyser-double-free should behave properly (cf the fresh PR109365
 )
  - Add proper support of the non-throwing new operators. At the moment,
any new operators are supposed non-throwing (on_allocator_call, param
returns_nonnull is implicitly set to false). Actually handling this would
result in less false positive of -Wanalyzer-possible-null-dereference I
observed by tweaking it to true. Indeed currently every throwing new
operators get this warning. This would fit the bill of PR94355
 and maybe the new
placement sizes could be ha

Re: [GSoC][analyzer-c++] Submission of a draft proposal

2023-04-03 Thread Benjamin Priour via Gcc
Hi David,

On Mon, Apr 3, 2023 at 12:38 AM David Malcolm  wrote:

>
> To be fair, C ones can be as well; the analyzer's exploded graphs tend
> to get very big on anything but the most trivial examples.
>
>
>
[...snip...]


>
> Indeed - you'll have to do a lot of looking at gimple IR dumps, what
> the supergraph looks like, etc, for all of this.
>
>
Yep. I have already tried my hands on them, but to be fair I'm still often
troubled by them. Still,
doing so have already provided essential insight on the analyzer.

[...snip...]


> > 4. Extension of out-of-bounds
> >  ( - Extending -Wout-of-bounds to the many vec<...> might be a
> > requirement.
> > However I have to look into more details for that one, I don't see
> > yet how
> > it could be done without a similar reuse of the assertions as for the
> > libstdc++.)
> >
> > From what I saw, despite the bugs not being FIXED, vfuncs seem to be
> > working nicely enough after the fix from GSoC 2021.
>
> IIRC I was keeping those bugs open because there's still a little room
> for making the analyzer smarter about the C++ type system e.g. if we
> "know" that a foo * is pointing at a particular subclass, maybe we know
> things about what vfunc implementations could be called.
>
> We could even try an analysis mode where we split the analysis path at
> a vfunc call, where we could create an out-edge in the egraph for each
> known concrete subclass for foo *, so that we can consider all the
> possible subclasses and the code that would be called.  (I'm not sure
> if this is a *good* idea, but it intrigues me)
>

Like adding a flag to run in a non-standard mode, to debug when an
unexpected vfunc analysis occurs ? TBH I didn't look that much into vfuncs
support, as my dummy tests behave OK and I assumed it was fixed after last
GSoC.


>
> >
> > Unfortunately I couldn't devote as much time as I wanted to gcc
> > yesterday,
> > I plan to send a proposal draft tomorrow evening. Sincerely sorry for
> > the
> > short time frame before the deadline.
>
> Sound promising.  Note that the deadline for submitting proposals to
> the official GSoC website is April 4 - 18:00 UTC (i.e. this coming
> Tuesday) and that Google are very strict about that deadline; see:
> https://developers.google.com/open-source/gsoc/timeline
>
> I believe you've already had a go at posting gcc patches to our mailing
> list: that's a great thing to mention in your application.
>
Thanks for the tip, I added it to my draft !

>
> Good luck!
> Dave
>
> Thanks ! BTW here is my draft proposal (in a google doc, I hope this is
OK).
If you can find the time to give me some feedback (as always), I would
greatly appreciate it !
Below I will dump the "project goals" part, so that it's openly available
on the mail list.
Note that I've annotated some sections with [RFC], it's for your
easy-of-use when reviewing part I'm explicitly asking for feedback. Just do
a Ctrl-F on the string [RFC]

A bit out of context, but since you always sign your mails 'Dave', should I
address you that way ? Unsure about that.

Best,
Benjamin. (see below for the dump)

The aim of this project is to enable the analyzer to self-analyze itself.
To do so, the following items should be implemented (m: minor, M: Major
feature)
> Generalize gcc.dg/analyzer tests to be run with both C and C++ [PR96395]
[M]
> Support the options relative to checker sm-malloc
   -Wanalyser-double-free should behave properly for C++ allocations
pairs new, new[], delete and delete[] both throwing and non-throwing
versions.
At the moment, only their non-throwing counterparts are somewhat
handled, yet incorrectly as the expected -Wanalyzer-double-free is replaced
by -Wanalyzer-use-after-free [m] and an incorrect
-Wanalyzer-possible-null-dereference is emitted [fixed].
 I filed it as bug PR109365 [2].
> Add support for tracking unique_ptr null-dereference [M]. While smart_ptr
is correctly handled, the code snippet below demonstrates that this warning
is not emitted for unique_ptr [4].
Figure 1 - First test case for unique_ptr support
struct A {int x; int y;};
int main () {
  std::unique_ptr a;
  a->x = 12; /* -Wanalyzer-null-dereference missing */
  return 0;
}
> Improve the diagnostic path for the standard library, with shared_ptr as
a comparison point, so that they do not wander through the standard library
code. [M]
Figure 2 - Reproducer to demonstrate unnecessarily long diagnostic paths
when using the standard library.
struct A {int x; int y;};
int main () {
  std::shared_ptr a;
  a->x = 4; /* Diagnostic path should stop here rather than going to
shared_ptr_base.h */
  return 0;
   }
[RFC] I believe this could be a 350-hours project as time flies by quickly,
but I’m more than open to your suggestions to support self-analysis. I’ve
read your idea on splitting at vfunc points, I’m currently looking into it.
An additional goal I have considered is to add out-of-bounds support for
the auto_vec. This would include supporting templates, but a shallow

Re: [GSoC][analyzer-c++] Submission of a draft proposal

2023-04-03 Thread Benjamin Priour via Gcc
Following last mail, a classic I forgot to link my draft !
https://docs.google.com/document/d/1MaLDo-Rt8yrJIvC1MO8SmFc6fp4eRQM_JeSdv-1kbsc/edit?usp=sharing

Best,
Benjamin.

On Mon, Apr 3, 2023 at 6:44 PM Benjamin Priour  wrote:

> Hi David,
>
> On Mon, Apr 3, 2023 at 12:38 AM David Malcolm  wrote:
>
>>
>> To be fair, C ones can be as well; the analyzer's exploded graphs tend
>> to get very big on anything but the most trivial examples.
>>
>>
>>
> [...snip...]
>
>
>>
>> Indeed - you'll have to do a lot of looking at gimple IR dumps, what
>> the supergraph looks like, etc, for all of this.
>>
>>
> Yep. I have already tried my hands on them, but to be fair I'm still often
> troubled by them. Still,
> doing so have already provided essential insight on the analyzer.
>
> [...snip...]
>
>
>> > 4. Extension of out-of-bounds
>> >  ( - Extending -Wout-of-bounds to the many vec<...> might be a
>> > requirement.
>> > However I have to look into more details for that one, I don't see
>> > yet how
>> > it could be done without a similar reuse of the assertions as for the
>> > libstdc++.)
>> >
>> > From what I saw, despite the bugs not being FIXED, vfuncs seem to be
>> > working nicely enough after the fix from GSoC 2021.
>>
>> IIRC I was keeping those bugs open because there's still a little room
>> for making the analyzer smarter about the C++ type system e.g. if we
>> "know" that a foo * is pointing at a particular subclass, maybe we know
>> things about what vfunc implementations could be called.
>>
>> We could even try an analysis mode where we split the analysis path at
>> a vfunc call, where we could create an out-edge in the egraph for each
>> known concrete subclass for foo *, so that we can consider all the
>> possible subclasses and the code that would be called.  (I'm not sure
>> if this is a *good* idea, but it intrigues me)
>>
>
> Like adding a flag to run in a non-standard mode, to debug when an
> unexpected vfunc analysis occurs ? TBH I didn't look that much into vfuncs
> support, as my dummy tests behave OK and I assumed it was fixed after last
> GSoC.
>
>
>>
>> >
>> > Unfortunately I couldn't devote as much time as I wanted to gcc
>> > yesterday,
>> > I plan to send a proposal draft tomorrow evening. Sincerely sorry for
>> > the
>> > short time frame before the deadline.
>>
>> Sound promising.  Note that the deadline for submitting proposals to
>> the official GSoC website is April 4 - 18:00 UTC (i.e. this coming
>> Tuesday) and that Google are very strict about that deadline; see:
>> https://developers.google.com/open-source/gsoc/timeline
>>
>> I believe you've already had a go at posting gcc patches to our mailing
>> list: that's a great thing to mention in your application.
>>
> Thanks for the tip, I added it to my draft !
>
>>
>> Good luck!
>> Dave
>>
>> Thanks ! BTW here is my draft proposal (in a google doc, I hope this is
> OK).
> If you can find the time to give me some feedback (as always), I would
> greatly appreciate it !
> Below I will dump the "project goals" part, so that it's openly available
> on the mail list.
> Note that I've annotated some sections with [RFC], it's for your
> easy-of-use when reviewing part I'm explicitly asking for feedback. Just do
> a Ctrl-F on the string [RFC]
>
> A bit out of context, but since you always sign your mails 'Dave', should
> I address you that way ? Unsure about that.
>
> Best,
> Benjamin. (see below for the dump)
>
> The aim of this project is to enable the analyzer to self-analyze itself.
> To do so, the following items should be implemented (m: minor, M: Major
> feature)
> > Generalize gcc.dg/analyzer tests to be run with both C and C++ [PR96395]
> [M]
> > Support the options relative to checker sm-malloc
>-Wanalyser-double-free should behave properly for C++ allocations
> pairs new, new[], delete and delete[] both throwing and non-throwing
> versions.
> At the moment, only their non-throwing counterparts are somewhat
> handled, yet incorrectly as the expected -Wanalyzer-double-free is replaced
> by -Wanalyzer-use-after-free [m] and an incorrect
> -Wanalyzer-possible-null-dereference is emitted [fixed].
>  I filed it as bug PR109365 [2].
> > Add support for tracking unique_ptr null-dereference [M]. While
> smart_ptr is correctly handled, the code snippet below demonstrates that
> this warning is not emitted for unique_ptr [4].
> Figure 1 - First test case for unique_ptr support
> struct A {int x; int y;};
> int main () {
>   std::unique_ptr a;
>   a->x = 12; /* -Wanalyzer-null-dereference missing */
>   return 0;
> }
> > Improve the diagnostic path for the standard library, with shared_ptr as
> a comparison point, so that they do not wander through the standard library
> code. [M]
> Figure 2 - Reproducer to demonstrate unnecessarily long diagnostic paths
> when using the standard library.
> struct A {int x; int y;};
> int main () {
>   std::shared_ptr a;
>   a->x = 4; /* Diagnostic path should stop here rather than going 

-Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds

2023-04-05 Thread Benjamin Priour via Gcc
Hi David,

I used the below code snippet to experiment with out-of-bounds (OOB) on
trunk. Three things occurred that I believe could see some improvement. See
https://godbolt.org/z/57n459cEs for the warnings.

int consecutive_oob_in_frame ()
{
int arr[] = {1,2,3,4,5,6,7};
int y1 = arr[9]; // only  this one get warnings (3*2 actually), expect
only 1 OOB though
int y2 = arr[10]; // expect a warning too, despite fooling with asm
int y3 = arr[50]; // expect a warning for that one too (see asm)
return (y1+y2+y3);
}

int goo () {
int x = consecutive_oob_in_frame (); // causes another pair of warnings
return 2 * x;
}

int main () {
goo (); // causes another pair of warning
consecutive_oob_in_frame (); // silent
int x [] = {1,2};
x[5]; /* silent, probably because another set of OOB warnings
has already been issued with this frame being the source */
return 0;
}


First, as the subject line reads, I get a
-Wanalyzer-use-of-uninitialized-value for each -Wanalyzer-out-of-bounds. I
feel it might be too much, as fixing the OOB would fix the former.
So maybe only OOB could result in a warning here ?

Second, it seems that if a frame was a cause for a OOB (either by
containing the spurious code or by being a caller to such code), it will
only emit one set of warning, rather than at each unique compromising
statements.

Finally, I think the diagnostic path should only go at deep as the
declaration of the injurious index.

What do you think ? If they also make sense to you I will open a RFE and
try my hands at fixing them.

Also, have you considered extending the current call summaries model
(symbolic execution from what you told me), to implement a partial VASCO
model ? There would still be no need for distributive problems.
Maybe a start could be that functions without parameters working on
non-mutable global data should not generate EN more than once, rather than
at each call sites.

Best,
Benjamin.


Re: -Wanalyzer-use-of-uninitialized-value always shadows -Wanalyzer-out-of-bounds

2023-04-06 Thread Benjamin Priour via Gcc
Hi David,
I haven't yet looked into your suggestions, probably won't have time until
tomorrow actually :/
Still, here are some updates

On Thu, Apr 6, 2023 at 2:32 AM David Malcolm  wrote:

> On Wed, 2023-04-05 at 19:50 +0200, Benjamin Priour wrote:
> > Hi David,
> >
> > I used the below code snippet to experiment with out-of-bounds (OOB)
> > on
> > trunk. Three things occurred that I believe could see some
> > improvement. See
> > https://godbolt.org/z/57n459cEs for the warnings.
> >
> > int consecutive_oob_in_frame ()
> > {
> > int arr[] = {1,2,3,4,5,6,7};
> > int y1 = arr[9]; // only  this one get warnings (3*2 actually),
> > expect
> > only 1 OOB though
> > int y2 = arr[10]; // expect a warning too, despite fooling with
> > asm
> > int y3 = arr[50]; // expect a warning for that one too (see asm)
> > return (y1+y2+y3);
> > }
> >
> > int goo () {
> > int x = consecutive_oob_in_frame (); // causes another pair of
> > warnings
> > return 2 * x;
> > }
> >
> > int main () {
> > goo (); // causes another pair of warning
> > consecutive_oob_in_frame (); // silent
> > int x [] = {1,2};
> > x[5]; /* silent, probably because another set of OOB warnings
> > has already been issued with this frame being the source */
> > return 0;
> > }
>
>

> There's quite a bit of duplication here.  My recollection is that
> there's code in the analyzer that's meant to be eliminating some of
> this e.g. we want to show the OOB when consecutive_oob_in_frame is
> called directly; we *don't* want to show it when
> consecutive_oob_in_frame is called by goo.  Perhaps this deduplication
> code isn't working?  Can you reproduce similar behavior with C, or is
> it specific to C++?
>
>
Identical behavior both in C and C++. I will look at this code, any hint at
where it starts ?
Otherwise I would find it the good old way.


>
> >
> > First, as the subject line reads, I get a
> > -Wanalyzer-use-of-uninitialized-value for each -Wanalyzer-out-of-
> > bounds. I
> > feel it might be too much, as fixing the OOB would fix the former.
> > So maybe only OOB could result in a warning here ?
>
> Yes, that's a good point.  Can you file a bug about this in bugzilla
> please?  (and feel free to assign it to yourself if you want to have a
> go at fixing it)
>

Unfortunately the Assignee field is grayed out for me in both enter_bug.cgi
and show_bug.cgi.
I've also created a new tracker bug for out-of-bounds, as there is a number
of related bugs.


>
> Maybe we could fix this by having region_model::check_region_bounds
> return a bool that signifies if the access is valid, and propagating
> that value up through callers so that we can return a non-
> poisoned_svalue at the point where we'd normally return an
> "uninitialized" poisoned_svalue.
>
> Alternatively, we could simply terminate any analysis path in which an
> OOB access is detected (by implementing the
> pending_diagnostic::terminate_path_p virtual function for class
> out_of_bounds).
>

I'm adding your suggestions as comment to the filed bugs so as to not
forget them.


>
> >
> > Second, it seems that if a frame was a cause for a OOB (either by
> > containing the spurious code or by being a caller to such code), it
> > will
> > only emit one set of warning, rather than at each unique compromising
> > statements.
>
> Maybe.  There's a pending_diagnostic::supercedes_p virtual function
> that perhaps we could implement for out_of_bounds (or its subclasses).
>
>

> >
> > Finally, I think the diagnostic path should only go at deep as the
> > declaration of the injurious index.
>
> I'm not quite sure what you mean by this, sorry.
>
>
Indeed not the best explanation so far. I was actually sort of suggesting
to only emit OOB only on direct call sites,
you did too, so in a way you have answered me on this.

Just an addition though: if there is an OOB independent of its enclosing
function's parameters, I think
it might make sense to not emit for this particular OOB outside the
function definition itself.
Meaning that no OOB should be emitted on call sites to this function for
this particular array access.
(Typically, consecutive_oob_in_frame () shouldn't have resulted in more
than one warning, since the OOB within is independent of its parameters).

I believe right now the expected behavior is to issue warnings only on
actual function calls, so that a function never called
won't result in warnings. As a result, the initial analysis of each
functions should never results in warnings -actually the case for
malloc-leak,
not for OOB though-.
Thus we would need to tweak this into actually diagnosing the issues on
initial analysis -those that can be at least-, so that they are saved for a
later
use whenever the function is actually called. Then we would emit them once,
and only once, because by nature these diagnostics are parameters
independent.
I hope I made it clearer, not more convoluted.

>
> > Also, have you considered extending the current call summaries 

PR109439 - Terminate analysis path when OOB detected

2023-05-01 Thread Benjamin Priour via Gcc
Hi David,

Sorry for not being active these past two weeks I've been overwhelmed at my
university with creating a new club and other uni stuff.

I just went back to solving these 3 bugs we discussed last time.


PR109439  is the one
about the double warnings emitted (both OOB and use of uninitialized value).
Your second suggestion of terminating the diagnostic path on an OOB proved
to interfere with PR109437.
I might actually close PR109437 as solving PR109439 will probably solve it
too.
I'm going with your first suggestion of bubbling a boolean from
check_region_bounds up to get_rvalue.
I'm considering two approaches
 1. preventing the check_for_poison call if there is an OOB Read.
 2. or marking the OOB values as Unknown rather than Poisoned, but then we
are semantically incorrect.

Another unrelated question, I felt like the use of an uninitialized value
terminating the path was a bit strong. No other warnings will be considered
for the remaining of the function if there is such use, even for unrelated
stuff. Like a double free on a completely different variable.
Couldn't we tune that so we only ignore everything related to any variable
tainted by this uninitialized value ?

Sorry again for the past weeks, the club is finally running (somewhat).
Benjamin.

PS: I submitted a patch, bootstrapped and regtested, for the bug I was
solving on gcc-request. I guess I'm not too clear on the process of
submitting a patch, as I probably had to commit and push it afterwards,
sadly there was no feedback on the previous RFC as well as on the patch
submission - no blaming at all, people are busy and the flow of mails is
massive.
I believe I still don't have the right to commit it directly to the repo,
and honestly even if I would using my fresh gcc account, I would prefer not
to commit it myself for the first patch, I don't wanna mess with anything
because of an oversight.


Testing a patch

2023-05-29 Thread Benjamin Priour via Gcc
Hi,

Regstrapping finally done for PR109439 - Spurious
-Wanalyzer-use-of-uninitialized-value tagging along
-Wanalyzer-out-of-bounds.
Now only a OOB warning is reported when necessary instead of OOB + Use
of uninitialized value.

Some tests in analyzer (out-of-bounds-*, realloc-5, pr101962) were
checking for the now removed use-of-uninitialized-value warning, and
therefore I fixed that.

But now I'm confused since the documentation reads that to perform
regtesting, one should use make -k check,
and that's what I always use too, but because I fixed the above test
files, contrib/compare_tests obviously complains about them having
disappeared.
Does it mean regtesting failed ? Can I submit the patch in its current
state or should I do something else before doing so ?
Guess I would get feedback anyway if something's wrong.

I figured I would send it here rather than to gcc-patches, as it's
more general than a discussion over a single patch.

Thanks,
Benjamin


Moving analyzer tests to c-c++-common

2023-06-06 Thread Benjamin Priour via Gcc
Hi David,

Sorry I didn't answer you earlier, I was busy concluding my term.
Finally finished yesterday, I now have 100% of my time for GCC.

I build yesterday the analyzer with -fanalyzer enabled. Globally there
are not many coherent warnings, and a common issue are the thesis-long
warnings.
Below are the headers of these warnings, the first two were emitted
dozen of times across the build.
I'll look into these two particularly as it would clean up a lot of
noise easily enough.

../../gcc/gcc/wide-int.h:1338:30: warning: use of uninitialized value
‘‘result_decl’ not supported by dump_expr’ [CWE-457]
[-Wanalyzer-use-of-uninitialized-value]
../../gcc/gcc/analyzer/bounds-checking.cc:658:44: warning: use of
uninitialized value ‘’ [CWE-457]
[-Wanalyzer-use-of-uninitialized-value]
../../gcc/gcc/make-unique.h:41:30: warning: use of possibly-NULL
‘operator new(120)’ where non-null expected [CWE-690]
[-Wanalyzer-possible-null-argument]

I'm also adding new tests for c++.
Specifically I'm writing tests for the operators [placement] new and
delete, as a number of false-positives occurred repeatedly when
building the analyzer.
I'm also reediting some of the c tests but with their c++
counterparts, such as using the standard libraries containers and see
how the analyzer behaves.

Should I put them under c-c++-common or c++ ? With a new analyzer
folder and analyzer.exp I guess.
For the warnings diving too deep into the standard library, I
considered adding a flag to the analyzer to control the maximum depth
of the warnings.

Have a nice day,
Benjamin.


An overview of the analyzer support of the operator new

2023-06-07 Thread Benjamin Priour via Gcc
Hi,

I've been mapping where the analyzer is lacking support of the operator new
different variants.
I've written a bunch of test cases already to demonstrate it, you can find
them below.
They are not yet formatted for a patch submission, and as some of them may
require new warnings, I didn't use dg-* directives either.
You will notice I included true positives and negatives as well, as I think
they might spur ideas on some edge cases that may fail.
All that to say I would greatly appreciate your comments if any test is
wrong, or if you have pointers on additional test cases.
You can also find a godbolt  here.

The most annoying one is the recurrent noisy false positive
-Wanalyzer-possible-null-argument on usage of a new expression.
Although a placement new on a static buffer too short is flagged by the
middle-end, the analyzer stay quiet.
A placement on a dynamic buffer too short to contain the placement is never
reported however. See PR105948


Thanks,
Benjamin

#include 

struct A
{
int x = 4;
int y = 6;
};

void test1()
{
int *x = ::new int; // true negative on -Wanalyzer-possible-null-argument
int *arr = ::new int[3]; // true negative on
-Wanalyzer-possible-null-argument
A *a = ::new A(); // false positive -Wanalyzer-possible-null-argument (a
throwing new cannot returns null)
::delete a;
::delete x;
::delete[] arr;
}

void test_allocators_mismatch()
{
int *a = ::new int;
int *b = ::new int[3];

::delete[] a; /* true positive -Wanalyzer-mismatching-deallocation flagged
*/
::delete b; /* true positive -Wanalyzer-mismatching-deallocation flagged */
}

// From clang core.uninitialized.NewArraySize
void test_garbage_new_array()
{
int n;
int *arr = ::new int[n]; /* true positive
-Wanalyzer-use-of-uninitialized-value reported for 'n' */
/* however nothing is reported for 'arr', even with
'-fno-analyzer-suppress-followups', one could expect a specific warning */
::delete[] arr; /* no warnings here either */
}

void test_placement()
{
void *chunk = ::operator new(20); // true negative
-Wanalyzer-possible-null-dereference
A *a = ::new (chunk) A();
a->~A();
::operator delete(chunk);
}

void test_delete_placement()
{
A *a = ::new A; // false positive -Wanalyzer-possible-null-argument
(throwing new)
int *z = ::new (&a->y) int;
a->~A(); // deconstruct properly
::operator delete(a);
::operator delete(z); // nothing from analyzer but got
-Wfree-nonheap-object, even though analyzer also has
Wanalyzer-free-of-non-heap
}

void test_write_placement_after_delete()
{
short *s = ::new short;
long *lp = ::new (s) long;
::delete s;
*lp = 12; // true positive -Wanalyzer-use-after-free flagged, as well as a
wrong -Wanalyzer-null-dereference of lp
}

void test_read_placement_after_delete()
{
short *s = ::new short;
long *lp = ::new (s) long;
::delete s;
long m = *lp; // true positive -Wanalyzer-use-after-free flagged, as well
as a wrong -Wanalyzer-null-dereference of lp
}

void test_use_placement_after_destruction()
{
A a;
int *lp = ::new (&a.y) int;
a.~A();
int m = *lp; /* true positive -Wanalyzer-use-of-uninitialized-value,
nothing about use-after-delete though */
}

// From clang cplusplus.PlacementNewChecker
void test_placement_size_static()
{
short s;
long *lp = ::new (&s) long; /* nothing from analyzer, but still got
-Wplacement-new= */
}

void test_placement_size_dynamic()
{
short *s = ::new short;
long *lp = ::new (s) long; // Nothing reported here at all, would expect a
-Wanalyzer-placement-new=
::delete s;
}

void test_placement_null()
{
int *x = nullptr;
int *p = ::new (x) int; // Placement new on NULL is undefined, yet nothing
is reported.
::operator delete(x);
}

void test_initialization_through_placement()
{
int x;
int *p = ::new (&x) int;
*p = 10;
int z = x + 2; // Everything is fine, no warning emitted
}


Re: Inquiry about Getting Started with GCC Projects as a Beginner

2023-06-30 Thread Benjamin Priour via Gcc
Hi,

I am a current GSoC student on GCC, so I'll let some of your questions to
more veteran contributors. However, feel free to also reach out to me if
you got any questions as to where to start, as I consider myself just out
of this phase.

Programming Languages: Which programming languages are commonly used in
   GCC projects? Are there any specific languages that you recommend
beginners
   to focus on initially?

GCC is written in a customized mix of C++ and C. Some C++ standard
constructs have been handmade, such as vectors. However, if you are
familiar with C++, I believe you'll adapt quickly to those. If you consider
contributing to some language front-end, or to some specific back-end, then
knowledge on that front will always be useful. However, from my short
experience, I can already tell you that GCC is in itself a great teacher,
as long as you have the basics, or put the extra effort to get them, you'll
step up your game quickly.

Tools: What are the essential tools utilized in GCC projects? It would
   be helpful to know which tools are commonly used for tasks such as
version
   control, project management, documentation, and testing.

Git is the one tool you have to get used for GCC, as for testing it is a
bit old-fashioned with a TCL based testsuite (see Dejagnu). I'd encourage
you to try install GCC from the sources, and poke around them in your
favorite editor. It might be a bit rough at first, but that's also how
you'll progress quickly.
Obviously, if you consider contributing to the development of GCC, you will
need to be debug some things. I think GDB is the most used debugger here.

 Learning Resources: Could you please recommend any online resources,
   tutorials, or documentation that can assist beginners in understanding
GCC
   projects and the associated technologies?

I would recommend https://gcc.gnu.org/contribute.html obviously, but also
https://gcc-newbies-guide.readthedocs.io/en/latest/index.html, that will
soon get extra content. If you decide to take a look at the sources, see
the documentation https://gcc.gnu.org/onlinedocs/gccint/index.html.
I am also writing a blog to keep track of my GSoC project, once I have a
proper open source license set up I will send you the link (in a few hours
if I'm not lazy).

Good luck and feel free to reach out,
Benjamin.


analyzer: New state machine should be C++ only

2023-07-12 Thread Benjamin Priour via Gcc
Hi David,


Lately I've been working on adding a new state machine to keep track of
ownership transfers

and misuses, e.g. to warn about use-after-move, partial or shallow
copy/move.

I'm trying to stay abstracted from heap allocated regions, and to rather
work with "resources",

so that the state machine could be easily further extended.

However, the whole concern of ownership is really C++-like, and most of the
checks would require

things unheard of in vanilla C, such as copy/move operators, ctors & dtors
...


Using those constructs, it is really doable to guess ownership of
resources, whereas without them it becomes

much more hazardous.

So, should we make this new sm -adroitly called sm-ownership- C++-only ?


Doing so would allow the sm to reuse code from under cp/*, thus it'd reduce
duplicating code and would

likely lead to less false positives in C++ -more precise function checks-,
though it would make any future C-support more tedious.

It's also going against the current flow of porting what's already done for
C to C++.


Best,

Benjamin.


Re: analyzer: New state machine should be C++ only

2023-07-13 Thread Benjamin Priour via Gcc
Hi,
On 13/07/2023 10:48, David Malcolm wrote:

(apologies for top-posting; I'm on vacation and don't have my usual email setup)

Sounds interesting, but I'm having difficulty imagining exactly what
you have in mind.

Can you post one or more concrete examples of buggy code that would be
caught by such a warning?

I'm still writing test cases as I come up with more buggy code, and should
post them as comments

under the existing PR, or open new ones. As for now, you can find below
some of them.


Shallow copy/move operations, when such operation is done on an object
owning a resource,

but no user-defined operation was provided (see PR107534
)

// from PR107534

struct S {
S() { p = new int(); }
~S() { delete p; }
int* p = nullptr;
};

int main() {
S s;
S ss = s; /* { dg-warning "shallow copy of instance of 'S' lacking
user-defined copy constructor" "-Wanalyzer-shallow-ownership-transfer"  {
xfail *-*-* } }  */
}

User-defined move and copy operations, but that are not moving everything
properly.

struct S
{
  std::string s1;
  std::string s2;

  S (S &&other)
  {
s1 = std::move (other.s1);
// no mention of s2
  } /* { dg-warning "partial move of instance of 'S' missing true move of
's2'" "-Wanalyzer-shallow-ownership-transfer"  { xfail *-*-* } }  */
};

Deletion of resource we do not own (with introduction of [[gnu::owner]] as
in PR106390 )

[[gnu::owner]]
extern void deallocator (int *ptr);

void test1 ()
{
  int *p = new int;
  deallocator (p);
  delete p; /* { dg-warning "'p' released resource owned by 'deallocator'"
"-Wanalyzer-ownership" { xfail *-*-* } } */
}

This would somewhat overlap with double-delete warning. So probably this
one would only be useful for the case above,

i.e. for a function we don't have the body of. Although using this warning
could give hints as to which 'delete' among the two is injurious.

Note that we would not flag a double resource release from the resource's
owner, at it is the job of sm-malloc at the moment.

The above attribute [[gnu::owner]] would help in supporting unique_ptr.


Since we keep track of who the owner of a resource is, we can also flag
use-after-move PR106388


struct S {
  std::string s;
  S (S && oth) : s (std::move(oth.s)) {}

};
void test_move ()
{
  S ss1;
  S ss2 = std::move (ss1);
  ss1.s.append('c'); /* { dg-warning "use-after-move of 'ss1.s'"
"-Wanalyzer-use-after-move" { xfail *-*-* } }  */
}


Why wouldn't it be caught by C++'s syntax-based ownership support?

Sorry I'm not sure I understand what you meant here. Are you talking
about the c++ guidelines about

pointers vs reference to represent ownership semantics ?

  Is there a PR in bugzilla for this idea?


None yet, but more like a collection of bugs (cited above) pointing
toward that idea.

Best,

Benjamin.


analyzer: Weekly update and questions on extending c++ support.

2023-07-31 Thread Benjamin Priour via Gcc
Hi David,

(subject's line totally stolen inspired from Eric's)
Last week as you know had some special circumstances therefore not much has
been done.

I've changed my in-progress patch about trimming the system headers event
to fit your suggestions from
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625245.html.
I had suggested that we keep events (1) (2) (11) and (12) in the reproducer
https://godbolt.org/z/sb9dM9Gqa
but after comparing it to your first suggestion of only keeping (11), I
don't know which one to keep.
Just (11) has the benefit of being really concise and verbose enough in
this case, mimicking the behavior of a raw pointer.
However having the call site and return site might be helpful for the user,
e.g. to understand what template arguments were deduced.

I think we could still have events (11) and (12): what if for call and
> return events we consider the location of both the call site and the
> called function: we suppress if both are in a system header, but don't
> suppress if only one is a system header.  That way by default we'd show
> the call into a system header and show the return from the system
> header, but we'd suppress all the implementation details within the
> system header.
>
> Does that seem like it could work?


It works and is one the two implementations I have.
Now we just have to decide which is better, and I'll submit the patch to
the mail list.

Some updates too on the added support of placement new

What do you think of "null-dereference" superceding
"use-of-unitialized-value" ?
Both can occur at the same statement (see
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625844.html)
If you think adding this supercedes_p to null-deref is acceptable, then
I'll fix PR 110830 
before submitting the placement new patch.


Otherwise, I've been working on what we talked about last time, i.e. moving
the tests under gcc.dg/analyzer
to c-c++-common/analyzer. It is still in progress (about 100 tests done out
of 700+).

A question on that front:
What is the purpose of test gcc.dg/analyzer/data-model-11.c ?

int test (void)
{
  unsigned char *s = (unsigned char *) "abc";
  char *t = "xyz";
  return s[1] + t[1];
}

Are the non-constness of the strings being tested or only their sign ?

Thanks
Benjamin.


analyzer: Weekly update on extending C++ support (2)

2023-08-07 Thread Benjamin Priour via Gcc
Hi Dave,

I made some more progress on moving tests from gcc.dg/analyzer/ to
c-c++-common/analyzer.
I'll only detail the most noteworthy tests I encountered.

gcc.dg/analyzer/pr103892.c troubled me with an Excess error when compiled
with g++
analysis bailed out early (91 'after-snode' enodes; 314 enodes)
[-Wanalyzer-too-complex]

pr103892.c is compiled with optimization level -O2.
Analysis bails out when the number of "after-SN" enodes explodes, i.e.
exceeds a certain proportion of the total number of SG nodes.
limit(after-SN) = m_sg.num_nodes ()  * param-bb-explosion-factor.
The reason why C and C++ differs is simply due to what '-O2' does to each
of them.
Under -O0 or -O1, there is no failure whatsoever, under -O2 only C++ fails,
whereas with -O3 both gcc and g++ emits -Wanalyzer-too-complex.
With -O2, although GCC produces a greater total number of "after-SN" enodes
than G++, their proportion barely stays under limit(after-SN) as
the total number of SN is also bigger, hence no warning is emitted.

All in all, I don't believe there is a significant difference here between
C and C++, nor is there much we can do about this.
Therefore I'll simply add a { dg-warning "analysis bailed out early" "" {
target c++ } }

An interesting divergence between GCC and G++ was in the handling of
built-ins.
Tests gcc.dg/analyzer/{pr104062.c,sprintf-2.c} are failing when compiling
with G++.
FAIL: c-c++-common/analyzer/pr104062.c  leak of ap7 at line 13 (test for
warnings, line 12)
The reason is neither realloc nor sprintf are considered by G++ as a
built-in, contrary to GCC.
C built-ins are mapped within the analyzer to known_functions using their
'enum combined_fn' code
(See kf.cc:register_known_functions), not their function name.
Therefore, we find a built-in known function by checking
tree.h:fndecl_built_in_p returns true
and calling known_function_manager::get_normal_builtin.
The former returns false for 'realloc' and 'sprintf' when using G++.

To fix that, I've derived builtin_known_function from known_function.

/* Subclass of known_function for builtin functions.  */

class builtin_known_function : public known_function
{
public:
  virtual enum built_in_function builtin_code () const = 0;
  tree builtin_decl () const {
gcc_assert (builtin_code () < END_BUILTINS);
return builtin_info[builtin_code ()].decl;
  }

  virtual const builtin_known_function *
  dyn_cast_builtin_kf () const { return this; }
  virtual builtin_known_function *
  dyn_cast_builtin_kf () { return this; }
};

And 'kfm.add (BUILT_IN_REALLOC, make_unique ());' becomes
kfm.add ("realloc", make_unique ());
kfm.add ("__builtin_realloc", make_unique ());

Unfortunately we have to register the built-ins using their function name
which is more apt to bugs,
and the double call to kfm.add is quite messy. Having two instances of
kf_realloc however is not that troubling,
as builtin_known_function objects are lightweight.

That GCC built-ins are not recognized by G++ doesn't only impede their
detection,
but also how we process them, and what information we have of them.
In gcc.dg/analyzer/sprintf-2.c, sprintf signature follows the manual and
cppreference.
Yet we expect sm-malloc to warn about use of NULL when either of the
argument is NULL,
although there is no attribute(nonnull) specified.
In fact, sm-malloc isn't using the signature of sprintf as specified in the
test case, but rather the one provided
by GCC in builtins.def

/* See e.g. https://en.cppreference.com/w/c/io/fprintf
   and https://www.man7.org/linux/man-pages/man3/sprintf.3.html */

  extern int
  sprintf(char* dst, const char* fmt, ...)
__attribute__((__nothrow__)); // No attribute(nonnull)

int
test_null_dst (void)
{
  return sprintf (NULL, "hello world"); /* { dg-warning "use of NULL where
non-null expected" } */
}

int
test_null_fmt (char *dst)
{
  return sprintf (dst, NULL);  /* { dg-warning "use of NULL where non-null
expected" } */
}

Signature in builtins.def: DEF_LIB_BUILTIN(BUILT_IN_SPRINTF,
"sprintf", BT_FN_INT_STRING_CONST_STRING_VAR,
ATTR_NOTHROW_NONNULL_1_FORMAT_PRINTF_2_3)

My question is then : Should G++ behave as GCC and ignore the user's
signatures of built-ins, instead using the attributes specified by GCC ?
At the moment I went with a "yes". If the function is a builtin, I'm
operating on its builtin_known_function::builtin_decl () (see above) rather
than the callee_fndecl
deduced from a gimple call, therefore overriding the user's signature.

Doing so led to tests sprintf-2.c and realloc-1.c to pass both in GCC and
G++.


Last test I'd like to discuss is analyzer/pr99193-1.c

/* { dg-additional-options "-Wno-analyzer-too-complex" } */

/* Verify absence of false positive from -Wanalyzer-mismatching-deallocation
   on realloc(3).
   Based on
https://github.com/libguestfs/libguestfs/blob/f19fd566f6387ce7e4d82409528c9dde374d25e0/daemon/command.c#L115
   which is GPLv2 or later.  */

typedef __SIZE_TYPE__ size_t;
typedef __builtin_va_list va_list;

#ifdef __cp

analyzer: How to recognize builtins

2023-08-23 Thread Benjamin Priour via Gcc
Hi David,

A quick update on transitioning the analyzer tests from gcc.dg to
c-c++-common.
As discussed previously, C builtins are not C++ builtins, therefore I had
to add
recognition of the builtins by their name rather than with the predicate
fndecl_built_in_p.

Do you know why this happens?  I see both of those tests have their own
> prototypes for the functions in question, rather than using system
> headers; maybe those prototypes don't match some property that the C++
> FE is checking for?
>

The C FE relies on c-decl.cc:c_builtin_function, the macro
C_DECL_BUILTIN_PROTOTYPE,
as well as c-decl.cc:match_builtin_function_types to recognize a builtin
function decl
and provide a unified type between what the user wrote, and the known
builtin fndecl.

I'd like however to correct something I've said previously

My question is then : Should G++ behave as GCC and ignore the user's
> signatures of built-ins, instead using the attributes specified by GCC ?
> At the moment I went with a "yes". If the function is a builtin, I'm
> operating on its builtin_known_function::builtin_decl () (see above) rather
> than the callee_fndecl
> deduced from a gimple call, therefore overriding the user's signature.
>

Actually, GCC does not ignore the user's signatures of built-ins, but if
match_builtin_function_types finds out that the user's signature
is close enough to the one described in builtins.def, then it will complete
the user's add with attributes and the likes so that we get an unified type.

A case where the user's signature *isn't* close enough to the built-in's is
gcc.dg/analyzer/pr96841.c's malloc's signature.

__SIZE_TYPE__
malloc (__SIZE_TYPE__);

To simplify, whenever a Wbuiltin-declaration-mismatch is emitted, then the
user's decl isn't unified with the built-ins,
and the extra GCC's attributes defined in builtins.def are not added to the
user's decl.
In such cases, the function decl is not recognized as a builtin by
fndecl_builtin_p, hence the analyzer's (before my patch) known_function
is never found out.
What this mean is that pr96841.c's malloc uses don't go within
kf_malloc:impl_call_pre nor kf_malloc:impl_call_post with the current
analyzer's handling of built-ins - do note however that sm-malloc goes as
expected, as it recognizes malloc calls both by name
and builtin decl (sm-malloc.cc::known_allocator_p).

With my patch, since the recognition is by name and no longer dependent on
the fndecl_builtin_p predicate, pr96841's malloc *is*
recognized as a builtin, and its kf is processed.
This led to pr96841.c no longer passing, as the while loop makes the
analysis too complex. (-Wanalyzer-too-complex)

> > My question is then : Should G++ behave as GCC and ignore the user's
> > signatures of built-ins, instead using the attributes specified by
> > GCC ?
> > At the moment I went with a "yes". If the function is a builtin, I'm
> > operating on its builtin_known_function::builtin_decl () (see above)
> > rather
> > than the callee_fndecl
> > deduced from a gimple call, therefore overriding the user's
> > signature.
> >
> > Doing so led to tests sprintf-2.c and realloc-1.c to pass both in GCC
> > and
> > G++.
>
> I think I agree with "yes" here.

I would like to proceed with what we said here, so that the analyzer
proceeds
with the analysis of malloc or any other builtins even when their fndecl is
not recognized
as such. Then I would update pr96841.c to expect a -Wanalyzer-too-complex.
Besides, sm-malloc already makes the assumption that a call named malloc
should be treated
as a malloc, whether it is matching a builtin or not.

Thanks,
Benjamin.


Re: How can I run xg++ from its build location?

2023-08-26 Thread Benjamin Priour via Gcc
Hi,

If you further have issues with loading required libraries,
you could try running a test intended for g++ (make check-c++ from under
BUILDDIR/gcc/)

You may find examples at
https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html#running-just-one-test-or-just-a-few

Then, open up BUILDDIR/gcc/testsuite/g++.dg/g++.log and search for
"Invoking the compiler as".
You will most certainly find helpful commands, populated with the actual
options your test is ran with by the testsuite, that you may later on adapt
to your needs.

Benjamin.


analyzer: Weekly update on extending C++ support (3)

2023-08-26 Thread Benjamin Priour via Gcc
Hi David,

Sorry I missed out on your answer about the operator new patch on the IRC.
Should I submit the first bit of the operator new patch ? Putting aside for
now fixing the
"uninitialized value" that accompanies "null deref" of nothrow new variants
?

About generalizing tests to C++, I'll soon have a second batch of similar
size ready,
probably by Monday. I try to find exactly one "real" bug to build each
batch around, to not simply
have a collection of C made C++ friendly.

A few questions on that point.

Test gcc.dg/analyzer/pr103892.c requires -O2.
As such the IPA generated from C and C++ differ,
C++ optimization cuts off function argstr_get_word from the IPA,
hence reducing the number of SN nodes from the SG.
This lesser number of SN is sufficient to make the analysis trips over
being too-complex.
The current formula for that upper limit is
limit = m_sg.num_nodes () * param_analyzer_bb_explosion_factor;
Thus shorter programs - such as the analyzer tests - are more likely to
be diagnosed as too complex. To avoid false positives perhaps we should
add an offset, so that short SG get their chance ?
This is just a tweak, and pr103892.c could as well be discarded for C++,
divergent IPA between C and C++ are unavoidable at some point, and some
tests won't make the transition anyway.

In gcc.dg/analyzer/malloc-1.c:test_50{b,c}, C++ is imprecise as of the
memory freed.

void test_50b (void)
{
  free ((void *) test_50a); /* { dg-warning "'free' of '&test_50a' which
points to memory not on the heap \\\[CWE-590\\\]" } */
  /* { dg-bogus "'free' of 'test_50a' which points to memory not on the
heap \\\[CWE-590\\\]" "c++ lacks precision of freed memory" { xfail c++ }
.-1 } */
}

void test_50c (void)
{
 my_label:
  free (&&my_label); /* { dg-warning "'free' of '&my_label' which points to
memory not on the heap \\\[CWE-590\\\]" } */
  /* { dg-warning "'free' of '&& my_label' which points to memory not on
the heap \\\[CWE-590\\\]" "" { xfail c++ } .-1 } */
}

What do you think of this ?
I've checked malloc_state_machine::handle_free_of_non_heap, arg and
diag_arg are strictly equal.
There might be something to be done with how a tree is transformed into a
diagnostic tree by get_diagnostic_tree,
but I'll wait for your feedback first.

Test gcc.dg/analyzer/pr94362-1.c actually has an additional null_deref
warning in C++, which is not affected by exceptions
or optimizations. I will keep updated on that one. Note that no warnings
are emitted for this test in C.

analyzer/pr94362-1.c: In function ‘const EVP_PKEY_ASN1_METHOD*
EVP_PKEY_asn1_find_str(ENGINE**, const char*, int)’:
analyzer/pr94362-1.c:55:16: warning: dereference of NULL ‘ameth’ [CWE-476]
[-Wanalyzer-null-dereference]
analyzer/pr94362-1.c:39:29: note: (1) entry to ‘EVP_PKEY_asn1_find_str’
analyzer/pr94362-1.c:42:31: note: (2) ‘ameth’ is NULL
analyzer/pr94362-1.c:53:43: note: (3) following ‘true’ branch...
analyzer/pr94362-1.c:54:31: note: (4) ...to here
analyzer/pr94362-1.c:54:31: note: (5) calling ‘EVP_PKEY_asn1_get0’ from
‘EVP_PKEY_asn1_find_str’
analyzer/pr94362-1.c:29:29: note: (6) entry to ‘EVP_PKEY_asn1_get0’
analyzer/pr94362-1.c:32:53: note: (7) ‘0’ is NULL
analyzer/pr94362-1.c:54:31: note: (8) returning to ‘EVP_PKEY_asn1_find_str’
from ‘EVP_PKEY_asn1_get0’
analyzer/pr94362-1.c:54:31: note: (9) ‘ameth’ is NULL
analyzer/pr94362-1.c:55:16: note: (10) dereference of NULL ‘ameth’


Cheers,
Benjamin.


Re: analyzer: Weekly update on extending C++ support (3)

2023-08-28 Thread Benjamin Priour via Gcc
Hi,

Test gcc.dg/analyzer/pr94362-1.c actually has an additional null_deref
> warning in C++, which is not affected by exceptions
> or optimizations. I will keep updated on that one. Note that no warnings
> are emitted for this test in C.
>
> analyzer/pr94362-1.c: In function ‘const EVP_PKEY_ASN1_METHOD*
> EVP_PKEY_asn1_find_str(ENGINE**, const char*, int)’:
> analyzer/pr94362-1.c:55:16: warning: dereference of NULL ‘ameth’ [CWE-476]
> [-Wanalyzer-null-dereference]
> analyzer/pr94362-1.c:39:29: note: (1) entry to ‘EVP_PKEY_asn1_find_str’
> analyzer/pr94362-1.c:42:31: note: (2) ‘ameth’ is NULL
> analyzer/pr94362-1.c:53:43: note: (3) following ‘true’ branch...
> analyzer/pr94362-1.c:54:31: note: (4) ...to here
> analyzer/pr94362-1.c:54:31: note: (5) calling ‘EVP_PKEY_asn1_get0’ from
> ‘EVP_PKEY_asn1_find_str’
> analyzer/pr94362-1.c:29:29: note: (6) entry to ‘EVP_PKEY_asn1_get0’
> analyzer/pr94362-1.c:32:53: note: (7) ‘0’ is NULL
> analyzer/pr94362-1.c:54:31: note: (8) returning to
> ‘EVP_PKEY_asn1_find_str’ from ‘EVP_PKEY_asn1_get0’
> analyzer/pr94362-1.c:54:31: note: (9) ‘ameth’ is NULL
> analyzer/pr94362-1.c:55:16: note: (10) dereference of NULL ‘ameth’
>
>
> Cheers,
> Benjamin.
>
>
As promised a quick update on that front. I've managed to pinpoint the
difference between the C and C++.
Compiling with -fno-exceptions -O0 the two IPA's seen by the analyzer are
nearly identical, if not for one difference.

const EVP_PKEY_ASN1_METHOD *EVP_PKEY_asn1_find_str(ENGINE **pe, const char
*str,
   int len)
{
  int i;
  const EVP_PKEY_ASN1_METHOD *ameth = (const EVP_PKEY_ASN1_METHOD *) ((void
*)0);
  ...
  for (i = EVP_PKEY_asn1_get_count(); i-- > 0;) {
ameth = EVP_PKEY_asn1_get0(i); /* At this point, i >= 0 */
if (ameth->pkey_flags & 0x1)
  continue;
return ameth;
  }
  ...
}

IPA when compiled by as C:

  :
  i_23 = EVP_PKEY_asn1_get_count ();
  goto ; [INV]

   :
  ameth_27 = EVP_PKEY_asn1_get0 (i_24);
  _2 = ameth_27->pkey_flags;
  _3 = _2 & 1;
  if (_3 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  // predicted unlikely by continue predictor.
  goto ; [INV]

   :
  _28 = ameth_27;
  goto ; [INV]

   :
  # i_5 = PHI 
  i.4_4 = i_5;
  i_24 = i.4_4 + -1;
  if (i.4_4 > 0)
goto ; [INV]
  else
goto ; [INV]

... and as C++

 :
  i_23 = EVP_PKEY_asn1_get_count ();
  goto ; [INV]

   :
  ameth_28 = EVP_PKEY_asn1_get0 (i_24);
  _2 = ameth_28->pkey_flags;
  _3 = _2 & 1;
  if (_3 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  // predicted unlikely by continue predictor.
  goto ; [INV]

   :
  _29 = ameth_28;
  goto ; [INV]

   :
  # i_5 = PHI 
  i.5_4 = i_5;
  i_24 = i.5_4 + -1;
  retval.4_25 = i.5_4 > 0;   /* Difference here. C++ stores the
comparison's result before using it */
  if (retval.4_25 != 0)
goto ; [INV]
  else
goto ; [INV]

This slight difference causes i_24 to not be part of an equivalence class
in C++, although it is part of one in C.
Therefore when calling ameth_28 = EVP_PKEY_asn1_get0 (i_24);, i_24 won't
appear as constrained in C++,
although we know it to be positive. Info is lost.
Further down the line, because of this missing ec
constraint_manager::eval_condition will evaluate to UNKNOWN,
and the pending diagnostic path won't be refused but accepted as feasible
in C++.

constraints and equivalence classes in C:

ec2: {(CONJURED(_8 = sk_EVP_PKEY_ASN1_METHOD_num (app_methos.1_2);,
_8)+(int)1)}
ec3: {(int)0 == [m_constant]'0'} /* int since constraining 'int i_24' */
ec4: {CONJURED(_8 = sk_EVP_PKEY_ASN1_METHOD_num (app_methos.1_2);, _8)} /*
This is the ec of i_24 */
ec5: {(int)-1 == [m_constant]'-1'}
constraints:
1: ec5 < ec4
2: ec3 < ec2

... vs C++'s:

ec2: {((CONJURED(_8 = sk_EVP_PKEY_ASN1_METHOD_num (app_methos.1_2);,
_8)+(int)1)>(int)0)}
ec3: {(bool)0 == [m_constant]'0'} /* bool instead of int cuz constraining
'bool retval.4_25' */
constraints:
1: ec2 != ec3

As you can see, i_24 equivalence class does not appear in C++.
I'm considering if in a condition lhs or rhs is a logical binop svalue, we
should unwrap one level, so that the condition's constraint is actually
applied on
the inner svalue.

Cheers,
Benjamin.


Re: analyzer: Weekly update on extending C++ support (3)

2023-08-31 Thread Benjamin Priour via Gcc
Hi David,


On Wed, Aug 30, 2023 at 1:01 AM David Malcolm  wrote:

>
> > ?
>
> Yes; please submit it, so that we can work towards getting what you
> have into trunk.
>
> Given that we don't properly support C++ yet, improvements to C++
> support don't have to be perfect.
>
>
It is next in queue for regstrapping, and great news, I nailed the
"supersedes" issue correctly.
I'll split it in two patches, first one for operator new per se that should
fix PR105948,
and all non user-defined variants of operator new should be supported, with
and without
exceptions enabled. Second will be a fix of PR110830, to prevent
superseding warnings
when their saved_diagnostic path are actually divergent.

My current implementation of the latter is a bit naive but works, thus I'm
trying to improve
its running time.


> >
> > About generalizing tests to C++, I'll soon have a second batch of
> > similar
> > size ready,
> > probably by Monday. I try to find exactly one "real" bug to build
> > each
> > batch around, to not simply
> > have a collection of C made C++ friendly.
>
> (nods)
>
> Thanks for pushing the 1st patch.  I've updated my working copies to
> try to ensure that my new tests go in c-c++-common as far as possible.
>
> >
> > A few questions on that point.
> >
> > Test gcc.dg/analyzer/pr103892.c requires -O2.
> > As such the IPA generated from C and C++ differ,
> > C++ optimization cuts off function argstr_get_word from the IPA,
> > hence reducing the number of SN nodes from the SG.
> > This lesser number of SN is sufficient to make the analysis trips
> > over
> > being too-complex.
> > The current formula for that upper limit is
> > limit = m_sg.num_nodes () * param_analyzer_bb_explosion_factor;
> > Thus shorter programs - such as the analyzer tests - are more likely
> > to
> > be diagnosed as too complex. To avoid false positives perhaps we
> > should
> > add an offset, so that short SG get their chance ?
>
> That's an interesting idea...
>
> > This is just a tweak, and pr103892.c could as well be discarded for
> > C++,
> > divergent IPA between C and C++ are unavoidable at some point, and
> > some
> > tests won't make the transition anyway.
>
> ...but this approach is simpler, so maybe go with that.
>
>
Nods.

>
> > In gcc.dg/analyzer/malloc-1.c:test_50{b,c}, C++ is imprecise as of
> > the
> > memory freed.
> >
> > void test_50b (void)
> > {
> >   free ((void *) test_50a); /* { dg-warning "'free' of '&test_50a'
> > which
> > points to memory not on the heap \\\[CWE-590\\\]" } */
> >   /* { dg-bogus "'free' of 'test_50a' which points to memory not on
> > the
> > heap \\\[CWE-590\\\]" "c++ lacks precision of freed memory" { xfail
> > c++ }
> > .-1 } */
> > }
> >
> > void test_50c (void)
> > {
> >  my_label:
> >   free (&&my_label); /* { dg-warning "'free' of '&my_label' which
> > points to
> > memory not on the heap \\\[CWE-590\\\]" } */
> >   /* { dg-warning "'free' of '&& my_label' which points to memory not
> > on
> > the heap \\\[CWE-590\\\]" "" { xfail c++ } .-1 } */
> > }
> >
> > What do you think of this ?
> > I've checked malloc_state_machine::handle_free_of_non_heap, arg and
> > diag_arg are strictly equal.
> > There might be something to be done with how a tree is transformed
> > into a
> > diagnostic tree by get_diagnostic_tree,
> > but I'll wait for your feedback first.
>
> What does g++ emit for this with your updated test?
>
>
I'm not sure what you meant here.
For a free ((void *) test_50a); gcc emits "'free' of '&test_50a'", whereas
g++ emits "'free' of 'test_50a'",
which is less precise about the actually memory freed. This however only
seems to occur with this
function pointers and labels.


> >
> > Test gcc.dg/analyzer/pr94362-1.c actually has an additional
> > null_deref
> > warning in C++, which is not affected by exceptions
> > or optimizations. I will keep updated on that one.
>
> [...snip...; I see you covered this in a followup]
>
>
The fix worked and even a few other XFAILs pass in other tests.
I am regstrapping the second batch of transitioned test,
following up with the "operator new" stuff.
I had an issue with the regstrap, I don't why but comparing the *.sum
files from my control and patched versions gives me unintelligible output.
It warns me about previous tests having disappeared, even some totally
unrelated to the analyzer, although all of them are still here and manually
running
dejagnu on each of them - one by one - gives the exact same output as
control.

So I'm cleaning up my control and patched folders, and re-regstrap
everything.
Something broke and I don't know what.


> BTW, was there any other work of yours that I need to review? (I know
> about the work on placement-new and testsuite migration)
>
> Thanks again for your work on this
> Dave
>
>
Thanks,
Benjamin.


GCC workshop at university

2023-09-27 Thread Benjamin Priour via Gcc
Hi everyone,

I'm in my final MSc year and figured after this weekend's Q&A
that I could replicate David's workshop on a smaller scale within my
university.

Would that be doable/acceptable ?
Is there any need for special licensing ? What about uploading the
session's recording afterwards ?

To Dave:
If the above is alright could I reuse some of your slides ?
I won't necessarily follow what you did, but some of them would be useful.

Cheers to you all,
Benjamin.