Static Analyzer: correlate state for different region/svalue for reporting

2023-03-15 Thread Pierrick Philippe

Hi everyone,

I have some question regarding the analyzer.
I'm currently working on an fully out-of-tree static analyzer plugin.
I started development on commit tagged as /basepoints/gcc-13/, but 
recently moved my code to a more recent /trunk/ branch (build /20230309/).


From my different experiments and analyzer's log readings, I have the 
intuition that '/ana::state_machine::state/' class is mostly (if not 
only) linked with '/ana::svalue/', i.e. the lvalue of a memory region.
In my analysis case, I would like to also be able to track state for 
some rvalue of a memory region, i.e. '/ana::region/' objects.
So, first question: is there any way to associate and track the state of 
a rvalue, independently of its lvalue?


To try to clarify the question, here's an example:

'''
int __attribute__("__some_attribute__") x = 42;
/* STEP-1
From now on, we consider the state of x as being marked by some_attribute.
But in fact, in the log, we can observe that we'll have something like 
this in the new '/ana::program_state/':

{0x4b955b0: (int)42: marked_state (‘x’)} */

int *y = &x;
/* STEP-2
For analysis purpose, you want to know that from now on, y is pointing 
to marked data.
So you set state of the LHS of the GIMPLE statement (i.e. some ssa_name 
instance of y) accordingly, with a state named 'points-to_marked_data'
and setting 'x' as the origin of the state (in the sense of the argument 
/origin/ from '/ana::sm_context::on_transition/'.

What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */


int z = *y;
/* STEP-3
Now you have implicit copy of marked data, and you want to report it.
So you state the LHS of the GIMPLE statement (i.e. some ssa_name 
instance of z) as being marked, with 'y' as the origin.

What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */

'''

In STEP-2:

We lost the information saying that the rvalue of y (i.e. y), is 
pointing to marked data.
Only remain the information that its lvalue (i.e. &x), is pointing to 
marked data, which is of course true.


In STEP-3:

No information is added regarding z in the new program_state.
In fact, if one have a closer look at the log, we see that the LHS of 
the GIMPLE statement (the ssa_name instance of z), is already in the 
state of 'marked_data'.
Through the log to the call to '/sm_context::on_transition/' this can be 
seen:

    marked_sm: state transition of ‘z_5’: marked_data -> marked_data

All of this step is somehow leading to confusing diagnostic.
For example, you miss the fact that 'z' is being marked, because no 
event happen as it is somehow aliasing 'x' svalue.

Though it might not be true in case of missed optimization.

Of course, if anyone wants to have a look at my code, I can provide it 
to you (it is not yet publicly available at the moment).

Also, let me know if there is any imprecise information.

Thank you for your time,

Pierrick


Re: Static Analyzer: correlate state for different region/svalue for reporting

2023-03-16 Thread Pierrick Philippe

On 15/03/2023 17:26, David Malcolm wrote:

On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:

Hi everyone,

Hi Pierrick


I have some question regarding the analyzer.
I'm currently working on an fully out-of-tree static analyzer plugin.
I started development on commit tagged as /basepoints/gcc-13/, but
recently moved my code to a more recent /trunk/ branch (build
/20230309/).

  From my different experiments and analyzer's log readings, I have
the
intuition that '/ana::state_machine::state/' class is mostly (if not
only) linked with '/ana::svalue/',

Yes: a program_state (which describes the state at a particular node
along an execution path) contains (among other things) an sm_state_map
per state_machine, and the sm_state_map maps from svalues to
state_machine::states.


i.e. the lvalue of a memory region.
In my analysis case, I would like to also be able to track state for
some rvalue of a memory region, i.e. '/ana::region/' objects.

You're using the terms "lvalue" and "rvalue", but if I'm reading your
email right, you're using them in exactly the opposite way that I
understand them.

I apologize, you are completely right, I swapped their usage...


I think of an "lvalue" as something that can be on the left-hand side
of an assignment: something that can be written to (an ana::region),
whereas an "rvalue" is something that can be on the right-hand side of
an assignment: a pattern of bits (an ana:svalue) that be written into
an lvalue.

An ana::svalue is a pattern of bits (possibly symbolic, such as "the
constant 42" or "the initial value of param x")

An ana::region is a description of a memory location for reads/writes
(possibly symbolic, such as "local var x within this frame", or "the
block reached by dereferencing the initial value of ptr in the frame
above").

Sorry if I've misread things; I'll try to answer the rest of the email
as best as I can...


So, first question: is there any way to associate and track the state
of
a rvalue, independently of its lvalue?

To try to clarify the question, here's an example:

'''
int __attribute__("__some_attribute__") x = 42;
/* STEP-1
  From now on, we consider the state of x as being marked by
some_attribute.
But in fact, in the log, we can observe that we'll have something
like
this in the new '/ana::program_state/':
{0x4b955b0: (int)42: marked_state (‘x’)} */

Yes: sm-state is associated within a program_state with an svalue, in
this case with the constant 42.

There isn't yet a way to instead give sm-state to "x" itself.  I
suppose you could give sm-state to &x (the pointer that points to x is
an instance of region_svalue, which is an svalue), but I haven't tried
this approach myself.


int *y = &x;
/* STEP-2
For analysis purpose, you want to know that from now on, y is
pointing
to marked data.
So you set state of the LHS of the GIMPLE statement (i.e. some
ssa_name
instance of y) accordingly, with a state named 'points-
to_marked_data'
and setting 'x' as the origin of the state (in the sense of the
argument
/origin/ from '/ana::sm_context::on_transition/'.
What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
(int)42
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */

Yes: you've set the state on the svalue "&x", not on "y".


int z = *y;
/* STEP-3
Now you have implicit copy of marked data, and you want to report it.
So you state the LHS of the GIMPLE statement (i.e. some ssa_name
instance of z) as being marked, with 'y' as the origin.
What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
(int)42
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
'''

Presumably the program_state also shows that you have a binding for the
region "z", containing the svalue 42 (this is represented within the
"store", within the region_model within the program_state).


Indeed, there is a binding for region "z" to the svalue 42 within the 
new program state.



In STEP-2:

We lost the information saying that the rvalue of y (i.e. y), is
pointing to marked data.
Only remain the information that its lvalue (i.e. &x), is pointing to
marked data, which is of course true.

Note that the analyzer by default attempts to purge state that it
doesn't think will be needed anymore, so it may have purged the state
of "y" if y isn't going to be read from anymore.  You could try -fno-
analyzer-state-purge to avoid this purging.


Nothing changes when I run it with the -fno-anlyzer-state-purge,
it is still the state of &x which is tracked.


In STEP-

[Tree-SSA] Question from observation, bogus SSA form?

2023-03-16 Thread Pierrick Philippe

Hi everyone,

I was working around with the analyzer, but I usually dump the SSA-tree 
to get a view of the analyzed code.
This is how I noticed something wrong, at least in the sense of the 
definition of SSA form.


I'm using a version of gcc build from a /trunk/ branch (/20230309/).

Here is an example code:

'''
int main(void) {
    int x = 42;
    int * y = &x;
    x = 6;
    return x;
}
'''

And here is the output from -fdump-tree-ssa-vops:

'''
;; Function main (main, funcdef_no=0, decl_uid=2739, cgraph_uid=1, 
symbol_order=0)


int main ()
{
  int * y;
  int x;
  int D.2744;
  int _5;

   :
  # .MEM_2 = VDEF <.MEM_1(D)>
  x = 42;                                           // First assignment 
to var_decl x

  y_3 = &x;
  # .MEM_4 = VDEF <.MEM_2>
  x = 6;                                         // Second 
assignment to var_decl x

  # VUSE <.MEM_4>
  _5 = x;
  # .MEM_6 = VDEF <.MEM_4>
  x ={v} {CLOBBER(eol)};

   :
:
  # VUSE <.MEM_6>
  return _5;

}
'''

The thing is, there is two distinct assignment to the same LHS tree at 
two different gimple statement, which is by definition not supposed to 
happened in SSA form.
Is there any particular reason this happen? Is that because the address 
of x is taken and stored?


I have to precise, I did not dig into the SSA form transformation and am 
a newbie to gcc source code.

So maybe my question is a bit naive or a known issue.

Thank you for your time,

Pierrick


Re: [Tree-SSA] Question from observation, bogus SSA form?

2023-03-17 Thread Pierrick Philippe

On 16/03/2023 17:30, Martin Jambor wrote:

Hello Pierrick,

On Thu, Mar 16 2023, Pierrick Philippe wrote:

Hi everyone,

I was working around with the analyzer, but I usually dump the SSA-tree
to get a view of the analyzed code.
This is how I noticed something wrong, at least in the sense of the
definition of SSA form.

please note that only some DECLs are put into a SSA form in GCC, these
are sometimes referred to as "gimple registers" and you can query the
predicate is_gimple_reg to figure out whether a DECL is one (putting
aside "virtual operands" which are a special construct of alias
analysis, are in an SSA form but the predicate returns false for them
for some reason).

This means that global variables, volatile variables, aggregates,
variables which are not considered aggregates but are nevertheless
partially modified (think insertion into a vector) or variables which
need to live in memory (most probably because their address was taken)
are not put into an SSA form.  It may not be easily possible.


Alright, I understand, but I honestly find it confusing.

I mean, aren't they any passes relying on the pure SSA form properties 
to analyze code?

For example to DSE or DCE.

[stripping]

The thing is, there is two distinct assignment to the same LHS tree at
two different gimple statement, which is by definition not supposed to
happened in SSA form.

I think it is now clear that x is not in SSA form because (at this stage
of the compilation) it is still considered to need to live in memory.

I think that I get your point.
It would be a heavy design change to be able to write it in pure SSA form.

Is there any particular reason this happen? Is that because the address
of x is taken and stored?

I have to precise, I did not dig into the SSA form transformation and am
a newbie to gcc source code.
So maybe my question is a bit naive or a known issue.

No worries, we know these important details are not straightforward when
you see them for the first time.

Good luck with your gcc hacking!


Thanks! :)

Pierrick




[Static Analyzer] Loop handling - False positive for malloc-sm

2023-03-20 Thread Pierrick Philippe

Hi everyone,

I'm still playing around with the analyzer, and wanted to have a look at 
loop handling.

I'm using a build from /trunk/ branch (/20230309/).

Here is my analyzed code:

'''
1| #include 
2| int main(void) {
3|    void * ptr = malloc(sizeof(int));
4|    for (int i = 0; i < 10; i++) {
5|    if (i == 5) free(ptr);
6|    }
7|}
'''

And here, the malloc-sm is reporting a double-free on line 5 with a 
quite confusing output:


'''
./test.c: In function ‘main’:
./test.c:5:21: warning: double-‘free’ of ‘ptr’ [CWE-415] 
[-Wanalyzer-double-free]

    5 | if (i == 5) free(ptr);
   | ^
  ‘main’: events 1-13
    |
    |   3 | void * ptr = malloc(sizeof(int));
    |  |    ^~~
    |  |    |
    |  |    (1) allocated here
    |   4 | for (int i = 0; i < 10; i++) {
    |  |   ~~~
    |  | |    |
    |  | |    (5) ...to here
    |  | (2) following ‘true’ branch (when 
‘i <= 9’)...
    |  | (6) following ‘true’ branch (when 
‘i <= 9’)...
    |  | (9) following ‘true’ branch (when 
‘i <= 9’)...

    |   5 | if (i == 5) free(ptr);
    |  |    ~       ~
    |  |    | |
    |  |    | (8) first ‘free’ here
    |  |    |         (12) ...to here
    |  |    | (13) second ‘free’ here; first 
‘free’ was at (8)

    |  |    (3) ...to here
    |  |    (4) following ‘false’ branch (when ‘i != 5’)...
    |  |    (7) ...to here
    |  |    (10) ...to here
    |  |    (11) following ‘true’ branch (when ‘i == 5’)...
    |
'''

So, I'm guessing that this false positive is due to how the analyzer is 
handling loops.

Which lead to my question: how are loops handled by the analyzer?

Thanks for your time,

Pierrick


Re: [Static Analyzer] Loop handling - False positive for malloc-sm

2023-03-21 Thread Pierrick Philippe

On 21/03/2023 00:30, David Malcolm wrote:

On Mon, 2023-03-20 at 13:28 +0100, Pierrick Philippe wrote:

Hi everyone,

I'm still playing around with the analyzer, and wanted to have a look
at
loop handling.
I'm using a build from /trunk/ branch (/20230309/).

Here is my analyzed code:

'''
1| #include 
2| int main(void) {
3|    void * ptr = malloc(sizeof(int));
4|    for (int i = 0; i < 10; i++) {
5|    if (i == 5) free(ptr);
6|    }
7|}
'''

[stripping]

So, I'm guessing that this false positive is due to how the analyzer
is
handling loops.
Which lead to my question: how are loops handled by the analyzer?

Sadly, the answer is currently "not very well" :/

I implemented my own approach, with a "widening_svalue" subclass of
symbolic value.  This is widening in the Abstract Interpretation sense,
(as opposed to the bitwise operations sense): if I see multiple values
on successive iterations, the widening_svalue tries to simulate that we
know the start value and the direction the variable is moving in.

This doesn't work well; arguably I should rewrite it, perhaps with an
iterator_svalue, though I'm not sure how it ought to work.  Some ideas:

* reuse gcc's existing SSA-based loop analysis, which I believe can
identify SSA names that are iterator variables, figure out their
bounds, and their per-iteration increments, etc.

* rework the program_point or supergraph code to have a notion of "1st
iteration of loop", "2nd iteration of loop", "subsequent iterations",
or similar, so that the analyzer can explore those cases differently
(on the assumption that such iterations hopefully catch the most
interesting bugs)


I see, I don't know if you ever considered allowing state machines to 
deal with loops on their own.
Such as having an API to allow to register a callback to handle loops, 
but not in a mandatory way.

Or having a set of APIs to optionally implement for the analyzer to call.

It would allow state machines to analyze loops with the meaning of their 
inner analysis.


Which could allow them to try to find a fixed point in the loop 
execution which doesn't have
any impact on the program state for that state machine. Kind of like a 
custom loop invariant.
Because depending of the analysis goal of the state machine, you might 
need to symbolically execute the loop
only a few times before reentering the loop and having the entry state 
being the same as the end-of-loop state.


In fact, this could be done directly by the analyzer, and only calling 
state machine APIs for loop handling which still has not reached
such a fixed point in their program state for the analyzed loop, with a 
maximum number of execution fixed by the analyzer to limit execution time.


Does what I'm saying make sense?

In terms of implementation, loop detection can be done by looking for 
strongly connected components (SCCs)

in a function graph having more than one node.
I don't know if this is how it is already done within the analyzer or not?

Thank you for your time,

Pierrick



Re: Static Analyzer: correlate state for different region/svalue for reporting

2023-03-21 Thread Pierrick Philippe

On 16/03/2023 14:44, David Malcolm wrote:

On Thu, 2023-03-16 at 09:54 +0100, Pierrick Philippe wrote:

On 15/03/2023 17:26, David Malcolm wrote:

On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:

[stripping]

So, first question: is there any way to associate and track the
state
of
a rvalue, independently of its lvalue?

To try to clarify the question, here's an example:

'''
int __attribute__("__some_attribute__") x = 42;
/* STEP-1
   From now on, we consider the state of x as being marked by
some_attribute.
But in fact, in the log, we can observe that we'll have something
like
this in the new '/ana::program_state/':
{0x4b955b0: (int)42: marked_state (‘x’)} */

[stripping]

int *y = &x;
/* STEP-2
For analysis purpose, you want to know that from now on, y is
pointing
to marked data.
So you set state of the LHS of the GIMPLE statement (i.e. some
ssa_name
instance of y) accordingly, with a state named 'points-
to_marked_data'
and setting 'x' as the origin of the state (in the sense of the
argument
/origin/ from '/ana::sm_context::on_transition/'.
What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
(int)42
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */

Yes: you've set the state on the svalue "&x", not on "y".


int z = *y;
/* STEP-3
Now you have implicit copy of marked data, and you want to report
it.
So you state the LHS of the GIMPLE statement (i.e. some ssa_name
instance of z) as being marked, with 'y' as the origin.
What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
(int)42
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
'''

Presumably the program_state also shows that you have a binding for
the
region "z", containing the svalue 42 (this is represented within
the
"store", within the region_model within the program_state).


[stripping]

This is an update about tracking state of svalues instead of region for 
other kind of variables than pointers.


If you consider the following code:

'''
int __attribute__((__taint__)) x = 42;
/* Program state:
{0x4b955b0: (int)42: marked_state (‘x’)}
*/
int y = 42;
// Program state unchanged
if (y);
/* When querying the sm_context about the state of y,
it returns it as being in a "marked_state", because its svalue is the 
same as x's one.

Even though no call to change y's state has been made.
And here it triggers a diagnostic for my analysis.
*/
'''

I understand way better now the internals of the analyzer regarding the 
state's tracking.
I do completely understand now, why you've said you've mainly designed 
it for pointers, because this allow you to avoid to do some points-to 
analysis, by associating state with pointer's svalues instead of 
pointer's region.


But as you can see in the above code example, it has its drawback for 
analyzing variables with a different semantics, such as integer type 
variable.


I will have to modified the analyzer's code to add a way for state 
machine to ask the analyzer to track region's state instead of svalue's 
state to be able to keep using it with my analysis plugin.


Do you think it would be interesting having such features merged within 
the analyzer?
In any case, I'll start to work on it over the last /trunk/ branch, 
within an appropriate branch.


Thank you for your time,

Pierrick


Re: [Static Analyzer] Loop handling - False positive for malloc-sm

2023-03-23 Thread Pierrick Philippe

On 22/03/2023 19:19, David Malcolm wrote:

On Tue, 2023-03-21 at 09:21 +0100, Pierrick Philippe wrote:

[stripping]

In fact, this could be done directly by the analyzer, and only
calling
state machine APIs for loop handling which still has not reached
such a fixed point in their program state for the analyzed loop, with
a
maximum number of execution fixed by the analyzer to limit execution
time.

Does what I'm saying make sense?

I think so, though I'm not sure how it would work in practice.
Consider e.g.

   for (int i = 0; i < n; i++)
  head = prepend_node (head, i);

which builds a chain of N dynamically-allocated nodes in a linked list.


Well, that would be a case where the loop's analysis will depend of the 
state machine.
If we consider the malloc-sm, it would allow it to track as different 
pointers each allocated pointers, until the limit of symbolic execution 
imposed by the analyzer is reached, are the svalue of N if it is a known 
integer at the current analysis point.


For other, such as a the file-sm, it would only be needed to 
symbolically execute it once, assuming prepend_node() is not opening any 
files.
So this state machine would not have to be executed more than once on 
the loop at this program point by the analyzer.


I think the different steps for such a different cases of loop analysis, 
is somehow using the second point of the RFE you shared above.


The "algorithm" I came with when thinking about it looks like this.
Of course, I'm definitely not an expert on the analyzer, so it is 
possibly not feasible.


* Detect loop, and try to get the termination constraint (possibly 
reduced if possible).

* Iterate on the loops' node N:
    * If N is the loop's first node:
        * Check if the actual program state is in a sufficient state to 
satisfy the loop's termination constraint,

            If so, stop analyzing the loop
        * Otherwise, check if the maximum number of symbolic execution 
fixed by the analyzer is reached,

            If so, stop analyzing the loop
        * Otherwise, keep going
    * Call every sm still impacting their program state map on node N

This should work for loops iterating on integer, for other kind of 
loops, it might be trickier though.



In terms of implementation, loop detection can be done by looking for
strongly connected components (SCCs)
in a function graph having more than one node.
I don't know if this is how it is already done within the analyzer or
not?

It isn't yet done in the analyzer, but as noted above there is code in
GCC that already does that (in cfgloop.{h,cc}).


I definitely have to look at this files.

Thank you for your time,

Pierrick



[analyzer][tree] Get the adress of a specific tree

2023-05-24 Thread Pierrick Philippe

Hi all,

I am working around array using a plugin to the analyzer.
And I face a problem here, I would like to be able to build a 
transformed representation of an element of the array from its subscript 
representation to its address representation (vice versa).
To image what I'm saying, I'm looking for a way to pass from 
't[(int)index]' to '&t + index * sizeof(element)', and from '&t + 
offset' to 't[offset / sizeof(element)]'.


From now, I'll use:

   - "subscript form" for the 't[index]' representation.
   - "address form" for the '&t + offset' representation.

I tried to do it manually:

   - only using the analyzer API (using
   /ana::region_model_manager::get_or_create_binop/) to get the
   corresponding /svalue/ (i.e. address form) of a subscript form.
   - by first building the /tree/ in address form using GIMPLE API
   (/fold_buildn/) and using analyzer API
   (/ana::region_model::get_rvalue/) to get its corresponding /svalue./

I managed to do it from the /tree/ in the subscript form, except that I 
can't use it to check if this /svalue/ is tracked within the current 
/sm_state_map/ because it's hash would be different than from the one 
potentially within the map, although they are semantically equivalent.


Is there by any chance an API within the analyzer which allow to go over 
the live svalues?


Cheers,

--
Pierrick


[analyzer] Comparing svalues

2023-06-01 Thread Pierrick Philippe

Hi David, hi all,

I'm working on a plugin for the analyzer, and basically I've reached a 
point where I need to compare svalues.
For the need of my analysis, I've modified the analyzer to be able to 
track for region in some specific cases, so I modified the 
implementation of the /sm_state_map/.
If anyone want to see my modifications, I would be glad to send it to 
you (not yet on a public repository).


I'm trying to handle all the different (only defined behavior) 
semantically correct ways to manipulate arrays.

To illustrate my words, here is an example:

   int t[4] = 0;
   t[2] = some_var; // valid and represented in GIMPLE by a single
   gassign stmt with LHS being an ARRAY_REF
   *(t+2) = some_var; // valid and represented in GIMPLE by two
   distincts gassign stmt with LHS_1 being a SSA_NAME and LHS_2 being a
   MEM_REF
   int *y = t + 1;
   *(y+1) = some_var; // valid and represented in GIMPLE by two
   distincts gassign stmt with LHS_1 being a SSA_NAME and LHS_2 being a
   MEM_REF

In this example, the same memory is modified and correspond to 't[2]'.
What I'm trying to do is to determine have a correlation between the 
region 't[2]' and the svalue '&t + 2 * sizeof(element)'.


I've manage to pass from the tree '&t + 2 * sizeof(element)' to the 
corresponding region 't[2]' using the 
/ana::region_model_manager::get_element_region/ API.
So that if I have the region corresponding to 't[2]' in the 
/sm_state_map/, it is correctly found within the inner /hash_map//region *, reg_entry_t>/.


It gets weird when working from going to the tree 't[2]' to the 
corresponding svalue '&t + 2 * sizeof(element)'.

Basically for now, I used several approaches:

   - I tried building the correspond tree using /buildN/ GIMPLE API and
   then the /ana::region_model::get_rvalue/ API, I did had a result
   being dumped as exactly what I needed, but the lookup (through
   /ana::sm_context::get_state/) within the inner /hash_map / of /sm_state_map/ was failing even though the
   same svalue was present in the /hash_map.
   /I tried to understand what was happening, and basically, it seems
   that the two svalues does not have the same address, though the same
   hash, leading to the lookup failure.

   - Right now, I am doing exactly the same to obtain the corresponding
   svalue, but instead of using /ana::sm_context::get_state/, I am
   iterating over all the live_values obtained through
   /ana::region_model::get_reachable_svalues/ until I find the same
   svalue in terms of semantics. Though, this is failing because there
   is currently no way to compare svalue's semantic.

So, basically I'm kind of stuck here and I have no idea how to properly 
go from a tree representation to its svalue/region one.

To explicit as much as possible I'm trying to do this:

   - Pass from 'tree t[2]' to 'svalue &t + 2 * sizeof(element)'; ->
   that part does not work

   - Pass from 'tree t + 2' to 'region t[2]'; -> that part is working

Would you have any idea about an API I would have missed or anything else?
I can definitely share my code if anyone want to have a look at it.

Thanks for reading,
Cheers,

Pierrick


[gimple-ssa] Get the gimple corresponding to the definition of a VAR_DECL

2023-06-27 Thread Pierrick Philippe

Hi everyone,

I'm trying to get the gimple * associated to the definition of a given 
var_decl.
Basically, I am iterating over the locals of a function (through the 
local_decls member) and I need to be able to get the gimple * of its 
definition within the function's gimple_seq.


Does any of you have an idea on how this could be achieved ?

Thanks,

Pierrick



Re: [gimple-ssa] Get the gimple corresponding to the definition of a VAR_DECL

2023-06-27 Thread Pierrick Philippe

On 27/06/2023 11:42, Richard Biener wrote:

On Tue, Jun 27, 2023 at 11:36 AM Pierrick Philippe
  wrote:

Hi everyone,

I'm trying to get the gimple * associated to the definition of a given
var_decl.
Basically, I am iterating over the locals of a function (through the
local_decls member) and I need to be able to get the gimple * of its
definition within the function's gimple_seq.

Does any of you have an idea on how this could be achieved ?

You need to walk the body of the function looking for defining
stmts.  Note the VAR_DECL might be in SSA form in which
case you can also walk all SSA names and check those
whose SSA_NAME_VAR is the specific VAR_DECL.  But then
SSA names can "lose" their associated decks so when a
variable is in SSA form the set of original assignments cannot
always be recovered easily.


Thank you for your answer Richard.
I'll have a look into that.

My main problem is regarding uninitialized definition, but still not 
being considered undefined behavior.

For example, the following code:

   int x;
   int *y = &x;
   *y = 6;

What I'm doing is basically looking at each gimple statement if the lhs 
has a given attribute for the purpose of the analysis I'm trying to perform.

To precise, I'm plugged into the analyzer, so an out-of-tree plugin.

But in the example above, the definition of x is not really within the 
gimple_seq of the function as it is never directly assigned.


Pierrick



Richard.


Thanks,

Pierrick



[Analyzer] Transform a region 't[i]' to its svalue form '&t + (i * element_size)'

2023-11-16 Thread Pierrick Philippe

Hi all, hi David,

I am trying to use the analyzer API to transform an element_region (such 
as 't[1]') to a binop_svalue with an inner region_svalue (such as '&t + 
4' in case of integers array) for analysis purpose.


I managed to do it the other way around (i.e. from a binop_svalue to an 
element_region) using the analyzer API (code is in attached file 
offset_to_elm.cc).

And everything is working as intended within the analyzer and my analysis.

The problem I'm having here is probably due to a mistake I'm doing on 
some argument to a function call, but I cannot see it to be honest (code 
misbehaving is in elm_to_offset.cc).


I attached a commented sample code I'd like to be able to analyze 
(test.c), alongside the commented GIMPLE code seen by the analyzer 
(test.c.075i.analyzer obtained through '-fdump-ipa-analyzer').

Feel free to ask question if needed of course.

I manage to rebuild a binop_sval from an element_region (code is in 
attached file elm_to_offset.cc), but resulting object is not the same 
than the one already within the state_map of my state machine.
To be clear, the svalue's pointer resulting from the code in 
elm_to_offset.cc is deifferent, though when they're logged with simple 
set to false, no difference seem to exist.
And I really do have the intuition that the problem might be related to 
the call to region_model_manager::get_ptr_svalue on elm_to_offset.cc:17.
I do think that the key is not find within 
region_model_manager::m_pointer_values_map is not found and the call to 
region_svalue constructor is performed within 
region_model_manager::get_ptr_svalue implementation.

But still, I have no idea why.

I did modified the analyzer state_map class to be able to also track 
regions states alongside svalues one (working on a patch, code is not 
respecting GCC's coding standard so far and most probably not optimized).


Any idea on what I'm doing wrong here would really be appreciated.

Beside this, I do think that enhancing the analyzer by allowing to track 
regions states could allow for a broader set of possible analysis. What 
do you think ?


Thank you,

Pierrick

P.S.: @David, sorry for double send. I did a mistake in the gcc's 
mailing list address.const svalue *elm_to_offset(sm_context *sm_ctx, const region *reg, logger *logger) {

  const svalue* res = nullptr;

  if (logger)
LOG_SCOPE(logger);

  auto model = sm_ctx->get_new_program_state()->m_region_model;
  auto mgr = model->get_manager();

  bit_offset_t offset;
  const svalue *offset_sval = nullptr;
  if (reg->get_relative_concrete_offset(&offset)) {
auto offset_byte = offset / BITS_PER_UNIT;
offset_sval = mgr->get_or_create_constant_svalue(wide_int_to_tree(sizetype, offset_byte));
auto base = reg->get_base_region();
auto base_sval = mgr->get_ptr_svalue(TYPE_POINTER_TO(base->get_type()), base);
res = mgr->get_or_create_binop(base->get_type(), POINTER_PLUS_EXPR, base_sval, offset_sval);
  }
  // TODO: symbolic offset
  else if (logger)
  logger->log("Offset is symbolic, not implemented.");

  if (logger) {
logger->start_log_line();
logger->log_partial("res: %p | ", res);
res ? res->dump_to_pp(logger->get_printer(), false) : logger->log_partial("nullptr");
logger->log_partial(" | ");
res ? res->dump_to_pp(logger->get_printer(), true) : logger->log_partial("nullptr");
logger->end_log_line();
  }

  return res;
}
const region *offset_to_elm(sm_context * sm_ctx, const svalue *sval, logger *logger) {

  const region * res = nullptr;

  if (logger)
LOG_SCOPE(logger);

  auto model = sm_ctx->get_new_program_state()->m_region_model;

  if (const region *deref = model->deref_rvalue(sval, NULL_TREE, nullptr)) {
if (logger) {
  logger->start_log_line();
  logger->log_partial("deref: ");
  deref->dump_to_pp(logger->get_printer(), false);
  logger->end_log_line();
}
auto base = deref->get_base_region();
bit_offset_t offset;
bit_size_t size;
if (deref->get_relative_concrete_offset(&offset)
  && deref->get_bit_size(&size)) {
  auto index = offset / size;
  auto model_mgr = model->get_manager();
  auto index_sized = wide_int_to_tree(sizetype, index);
  auto sval_index_sized = model_mgr->get_or_create_constant_svalue(index_sized);
  res = model_mgr->get_element_region(base, TREE_TYPE(base->get_type()), sval_index_sized);
}
// TODO: symbolic offset
else if (logger)
logger->log("Offset is symbolic, not implemented.");
  }

  if (logger) {
logger->start_log_line();
logger->log_partial("input: %p | ", sval);
sval->dump_to_pp(logger->get_printer(), true);
logger->log_partial(" | ");
sval->dump_to_pp(logger->get_printer(), false);
logger->end_log_line();
logger->start_log_line();
logger->log_partial("res: ");
res ? res->dump_to_pp(logger->get_printer(), true) : logger->log_partial("nullptr");
logger->end_log_line();
  }

  return res;
}
void some_func(void) 

[Tree][Static Analyzer] Tree representing types and svalues type

2024-01-16 Thread Pierrick Philippe
Hi David, hi all,

I was playing along with APIs from the Static Analyzer and encountered a
segfault in gcc/tree.cc:5068 (i.e. in function build2 and failure is due
to a
gcc_assert call), after a call to
ana::region_model::get_representative_tree.

>From my debugging of the problem, I realized that the svalue which I
built with
the SA API as the following from an element_region (C sample):

    const ana::region *base = elm_reg->get_base_region();
    // base_ptr is the svalue passed to get_representative_tree
    const ana::svalue *base_ptr = base->get_ptr_svalue(
                                                                  
TYPE_POINTER_TO(base->get_type),
                                                                   base);

I realized that the SA resulting in the call to get_ptr_svalue has
NULL_TREE as
type (return by get_type on base_ptr).  And this is because TYPE_POINTER_TO
(base->get_type) is returning a NULL_TREE.  I found my way around by calling
build_pointer_type(base->get_type()) which is currently building the
expecting
type.

But from this, two questions raised in my mind:

1. Is it coherent for the ana::region_model_manager::get_ptr_svalue to
return a sval with a null type?

2. Can a tree view as a tree_type_common member have an uninitialized
pointer_to member?  I think, but definitely not sure) that the member is not
initialized here by some part of the compiler, because I think that the
inner
type of regions and svalues are build by the SA by using the TREE_TYPE macro
directly from the GIMPLE.  I ask this question to know if this is a bug
and I
should try to do a minimal example out of it, or if it is a intended
behavior
for any reason from the compiler.

Thank you for reading me,

Pierrick



Re: [Tree][Static Analyzer] Tree representing types and svalues type

2024-01-18 Thread Pierrick Philippe
On 17/01/2024 23:52, David Malcolm wrote:
> On Tue, 2024-01-16 at 15:44 +0100, Pierrick Philippe wrote:
>> Hi David, hi all,
> Hi Pierrick.
First, thanks for you answer.
[stripping]
> I confess that I've been quite sloppy in places with types in the
> analyzer, keeping track of them when it's easy, but letting them be
> NULL in places where getting a type was hard, or apparently not
> possible.
>
> Sorry about this.
>
> I think my implementation is a bit muddled about what an svalue
> actually is.
>
> Informally, I think of an svalue as a partial function from bit offsets
> to values of the bits.  For example:
> * region_svalue: e.g. bit offsets 0-63 are the pointer to REG;
> undefined elsewhere
> * constant_svalue: e.g. given (int)42, bit offsets 0-31 are the
> constant 42, encoded in the target's usual way; undefined elsewhere
> * unknown_svalue: undefined everywhere
> ...etc, and the above definition doesn't need svalues to have a type.
>
> The above is rather vague: how should this work for unaryop_svalue and
> binop_svalue; these need to have types, so that we can think of them as
> e.g. 
> * "bits 0-31 are the result of a signed 32-bit int addition of the
> values of bits 0-32 from X and Y when considered as signed 32-bit int".
> * "bits 0-31 are the result of zero-extending bits 0-7 of X and
> treating the result as uint32_t".

I do understand, but here, the thing is that the analyzer built a
region_svalue without any type. And thought it was a bit weird.

But I do get your point here.

> In the meantime, you can't rely on an svalue having a non-NULL type,
> and hence it's often not possible to get a representative tree from an
> svalue, since tree expressions must have non-NULL types.

But then, shouldn't the analyzer have a defensive approach in
get_representative_tree and return a NULL_TREE in such cases? Because
the segfault is happening behind the scene. Maybe it shouldn't, I have
no idea what would be the best approach here.

>> But from this, two questions raised in my mind:
>>
>> 1. Is it coherent for the ana::region_model_manager::get_ptr_svalue
>> to
>> return a sval with a null type?
> It depends what you mean by "coherent"; given the sloppy approach
> above, the answer is "yes".
(nod)
>> 2. Can a tree view as a tree_type_common member have an uninitialized
>> pointer_to member?
> It shouldn't be uninitialized.  Perhaps the tree node isn't the right
> kind?  You should use the TYPE_POINTER_TO macro to access the field,
> which will use TYPE_CHECK to check that the node is a type in a debug
> build of the compiler.
Well, I'm pretty sure I didn't build it myself and only used macros
and API functions to get that type. And this is what is confusing me.

[stripping]

> Sounds like a bug; please try to construct a minimal example.
Sure, I'll try to build a minimal example.
Will have to use a plugin to do so though, hope that won't be a problem.

Thanks,
Pierrick

> Thanks
> Dave
>
>> Thank you for reading me,
>>
>> Pierrick


Using std types and functions within GCC

2024-03-14 Thread Pierrick Philippe
Hi all,

I was wondering, is there any conventions or guidelines regarding the
usage of types and/or functions coming from the C++ std library within
the compiler?

To explicit a bit more, I am working on modification on the analyzer and
might need to use a pair.

Thank you for your time,

Pierrick


[gimple-ssa] result_decl and ssa_name

2024-04-05 Thread Pierrick Philippe
Hi all,

I do have a question regarding ssa_name and result_decl.

For example on the following gimple function:

int f ()
{
  int x;
  int D.2747;
  int _2;

   :
  x_1 = 42;
  _2 = x_1;

   :
:
  return _2;

}

On the above example, using the macro SSA_NAME_VAR() on _2 does not
yield anything usable.
Neither to call ssa_default_def() on the result of the result_decl
obtain through macro DECL_RESULT().

Is there a way to get the ssa_name corresponding to the result_decl of a
function obtained through the use of macro DECL_RESULT() on a fn_decl?
And/or the other way around? I.e., from the returned ssa_name of a
function to the result_decl of that function?

I totally might be missing something here, but I cannot figure out what.

Thanks for your time,

Pierrick



Re: [gimple-ssa] result_decl and ssa_name

2024-04-05 Thread Pierrick Philippe
On 05/04/2024 14:46, Richard Biener wrote:
> On Fri, Apr 5, 2024 at 1:59 PM Pierrick Philippe
>  wrote:
>> Hi all,
>>
>> I do have a question regarding ssa_name and result_decl.
>>
>> For example on the following gimple function:
>>
>> int f ()
>> {
>>   int x;
>>   int D.2747;
>>   int _2;
>>
>>:
>>   x_1 = 42;
>>   _2 = x_1;
>>
>>:
>> :
>>   return _2;
>>
>> }
>>
>> On the above example, using the macro SSA_NAME_VAR() on _2 does not
>> yield anything usable.
>> Neither to call ssa_default_def() on the result of the result_decl
>> obtain through macro DECL_RESULT().
>>
>> Is there a way to get the ssa_name corresponding to the result_decl of a
>> function obtained through the use of macro DECL_RESULT() on a fn_decl?
>> And/or the other way around? I.e., from the returned ssa_name of a
>> function to the result_decl of that function?
>>
>> I totally might be missing something here, but I cannot figure out what.
> DECL_RESULT isn't always used (as in your example).  Not all SSA names
> have corresponding declarations, we have "anonymous" SSA names which
> have a NULL_TREE SSA_NAME_VAR (such as the _2 in your example).
I see, that makes so much more sense to me now.
> What do you try to find in the end?  If you want to find all returns you can
> walk predecessors of EXIT_BLOCK and look at their last stmt whether they
> are greturn statements.

I am implementing a state_machine within the analyzer, and I am trying
to understand where would be the best place to propagate the state of
the return value.
I intuitively thought it would be best to do so in the
state_machine::on_pop_frame() method, which is called by the analyzer
between the two frames of the caller and the callee. What I do have
access to is the struct function of the callee/caller, the gcall
instruction in the caller and the callee have been processed by my
analysis.

And to illustrate, here I do have the _2 ssa_name and its state which I
know in that case should be propagate to the lhs of the caller gcall
instruction.

Again I might be taking this in a wrong way.

> Richard.
>> Thanks for your time,
>>
>> Pierrick


Re: [gimple-ssa] result_decl and ssa_name

2024-04-08 Thread Pierrick Philippe
On 06/04/2024 14:53, Richard Biener wrote:
> On Fri, Apr 5, 2024 at 3:44 PM Pierrick Philippe
>  wrote:
>> On 05/04/2024 14:46, Richard Biener wrote:
>>
>> On Fri, Apr 5, 2024 at 1:59 PM Pierrick Philippe
>>  wrote:
>>
>> Hi all,
>>
>> I do have a question regarding ssa_name and result_decl.
>>
>> For example on the following gimple function:
>>
>> int f ()
>> {
>>   int x;
>>   int D.2747;
>>   int _2;
>>
>>:
>>   x_1 = 42;
>>   _2 = x_1;
>>
>>:
>> :
>>   return _2;
>>
>> }
>>
>> On the above example, using the macro SSA_NAME_VAR() on _2 does not
>> yield anything usable.
>> Neither to call ssa_default_def() on the result of the result_decl
>> obtain through macro DECL_RESULT().
>>
>> Is there a way to get the ssa_name corresponding to the result_decl of a
>> function obtained through the use of macro DECL_RESULT() on a fn_decl?
>> And/or the other way around? I.e., from the returned ssa_name of a
>> function to the result_decl of that function?
>>
>> I totally might be missing something here, but I cannot figure out what.
>>
>> DECL_RESULT isn't always used (as in your example).  Not all SSA names
>> have corresponding declarations, we have "anonymous" SSA names which
>> have a NULL_TREE SSA_NAME_VAR (such as the _2 in your example).
>>
>> I see, that makes so much more sense to me now.
>>
>> What do you try to find in the end?  If you want to find all returns you can
>> walk predecessors of EXIT_BLOCK and look at their last stmt whether they
>> are greturn statements.
>>
>> I am implementing a state_machine within the analyzer, and I am trying to 
>> understand where would be the best place to propagate the state of the 
>> return value.
>> I intuitively thought it would be best to do so in the 
>> state_machine::on_pop_frame() method, which is called by the analyzer 
>> between the two frames of the caller and the callee. What I do have access 
>> to is the struct function of the callee/caller, the gcall instruction in the 
>> caller and the callee have been processed by my analysis.
> It might make sense to record the analysis of the return value in
> meta-data that the analyzer keeps and access it that way.
> Other than that you'd have to do it the way I said with finding the
> greturn stmts again and look at what is returned there.

That is what I had in mind, thanks for answering me.

I do have another question though, how do you obtain the decl_context of
such an ssa_name? The DECL_CONTEXT macro is failing during the tree
check and I ha ve no idea how to know where a given ssa_name is declared
without accessing its inner var (through SSA_NAME_VAR). And this
operation is failing on _i ssa_name trees.

>> And to illustrate, here I do have the _2 ssa_name and its state which I know 
>> in that case should be propagate to the lhs of the caller gcall instruction.
>>
>> Again I might be taking this in a wrong way.
>>
>> Richard.
>>
>> Thanks for your time,
>>
>> Pierrick