Static Analyzer: correlate state for different region/svalue for reporting
Hi everyone, I have some question regarding the analyzer. I'm currently working on an fully out-of-tree static analyzer plugin. I started development on commit tagged as /basepoints/gcc-13/, but recently moved my code to a more recent /trunk/ branch (build /20230309/). From my different experiments and analyzer's log readings, I have the intuition that '/ana::state_machine::state/' class is mostly (if not only) linked with '/ana::svalue/', i.e. the lvalue of a memory region. In my analysis case, I would like to also be able to track state for some rvalue of a memory region, i.e. '/ana::region/' objects. So, first question: is there any way to associate and track the state of a rvalue, independently of its lvalue? To try to clarify the question, here's an example: ''' int __attribute__("__some_attribute__") x = 42; /* STEP-1 From now on, we consider the state of x as being marked by some_attribute. But in fact, in the log, we can observe that we'll have something like this in the new '/ana::program_state/': {0x4b955b0: (int)42: marked_state (‘x’)} */ int *y = &x; /* STEP-2 For analysis purpose, you want to know that from now on, y is pointing to marked data. So you set state of the LHS of the GIMPLE statement (i.e. some ssa_name instance of y) accordingly, with a state named 'points-to_marked_data' and setting 'x' as the origin of the state (in the sense of the argument /origin/ from '/ana::sm_context::on_transition/'. What we now have in the new '/ana::program_state/' is this: {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ int z = *y; /* STEP-3 Now you have implicit copy of marked data, and you want to report it. So you state the LHS of the GIMPLE statement (i.e. some ssa_name instance of z) as being marked, with 'y' as the origin. What we now have in the new '/ana::program_state/' is this: {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ ''' In STEP-2: We lost the information saying that the rvalue of y (i.e. y), is pointing to marked data. Only remain the information that its lvalue (i.e. &x), is pointing to marked data, which is of course true. In STEP-3: No information is added regarding z in the new program_state. In fact, if one have a closer look at the log, we see that the LHS of the GIMPLE statement (the ssa_name instance of z), is already in the state of 'marked_data'. Through the log to the call to '/sm_context::on_transition/' this can be seen: marked_sm: state transition of ‘z_5’: marked_data -> marked_data All of this step is somehow leading to confusing diagnostic. For example, you miss the fact that 'z' is being marked, because no event happen as it is somehow aliasing 'x' svalue. Though it might not be true in case of missed optimization. Of course, if anyone wants to have a look at my code, I can provide it to you (it is not yet publicly available at the moment). Also, let me know if there is any imprecise information. Thank you for your time, Pierrick
Re: Static Analyzer: correlate state for different region/svalue for reporting
On 15/03/2023 17:26, David Malcolm wrote: On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote: Hi everyone, Hi Pierrick I have some question regarding the analyzer. I'm currently working on an fully out-of-tree static analyzer plugin. I started development on commit tagged as /basepoints/gcc-13/, but recently moved my code to a more recent /trunk/ branch (build /20230309/). From my different experiments and analyzer's log readings, I have the intuition that '/ana::state_machine::state/' class is mostly (if not only) linked with '/ana::svalue/', Yes: a program_state (which describes the state at a particular node along an execution path) contains (among other things) an sm_state_map per state_machine, and the sm_state_map maps from svalues to state_machine::states. i.e. the lvalue of a memory region. In my analysis case, I would like to also be able to track state for some rvalue of a memory region, i.e. '/ana::region/' objects. You're using the terms "lvalue" and "rvalue", but if I'm reading your email right, you're using them in exactly the opposite way that I understand them. I apologize, you are completely right, I swapped their usage... I think of an "lvalue" as something that can be on the left-hand side of an assignment: something that can be written to (an ana::region), whereas an "rvalue" is something that can be on the right-hand side of an assignment: a pattern of bits (an ana:svalue) that be written into an lvalue. An ana::svalue is a pattern of bits (possibly symbolic, such as "the constant 42" or "the initial value of param x") An ana::region is a description of a memory location for reads/writes (possibly symbolic, such as "local var x within this frame", or "the block reached by dereferencing the initial value of ptr in the frame above"). Sorry if I've misread things; I'll try to answer the rest of the email as best as I can... So, first question: is there any way to associate and track the state of a rvalue, independently of its lvalue? To try to clarify the question, here's an example: ''' int __attribute__("__some_attribute__") x = 42; /* STEP-1 From now on, we consider the state of x as being marked by some_attribute. But in fact, in the log, we can observe that we'll have something like this in the new '/ana::program_state/': {0x4b955b0: (int)42: marked_state (‘x’)} */ Yes: sm-state is associated within a program_state with an svalue, in this case with the constant 42. There isn't yet a way to instead give sm-state to "x" itself. I suppose you could give sm-state to &x (the pointer that points to x is an instance of region_svalue, which is an svalue), but I haven't tried this approach myself. int *y = &x; /* STEP-2 For analysis purpose, you want to know that from now on, y is pointing to marked data. So you set state of the LHS of the GIMPLE statement (i.e. some ssa_name instance of y) accordingly, with a state named 'points- to_marked_data' and setting 'x' as the origin of the state (in the sense of the argument /origin/ from '/ana::sm_context::on_transition/'. What we now have in the new '/ana::program_state/' is this: {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ Yes: you've set the state on the svalue "&x", not on "y". int z = *y; /* STEP-3 Now you have implicit copy of marked data, and you want to report it. So you state the LHS of the GIMPLE statement (i.e. some ssa_name instance of z) as being marked, with 'y' as the origin. What we now have in the new '/ana::program_state/' is this: {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ ''' Presumably the program_state also shows that you have a binding for the region "z", containing the svalue 42 (this is represented within the "store", within the region_model within the program_state). Indeed, there is a binding for region "z" to the svalue 42 within the new program state. In STEP-2: We lost the information saying that the rvalue of y (i.e. y), is pointing to marked data. Only remain the information that its lvalue (i.e. &x), is pointing to marked data, which is of course true. Note that the analyzer by default attempts to purge state that it doesn't think will be needed anymore, so it may have purged the state of "y" if y isn't going to be read from anymore. You could try -fno- analyzer-state-purge to avoid this purging. Nothing changes when I run it with the -fno-anlyzer-state-purge, it is still the state of &x which is tracked. In STEP-
[Tree-SSA] Question from observation, bogus SSA form?
Hi everyone, I was working around with the analyzer, but I usually dump the SSA-tree to get a view of the analyzed code. This is how I noticed something wrong, at least in the sense of the definition of SSA form. I'm using a version of gcc build from a /trunk/ branch (/20230309/). Here is an example code: ''' int main(void) { int x = 42; int * y = &x; x = 6; return x; } ''' And here is the output from -fdump-tree-ssa-vops: ''' ;; Function main (main, funcdef_no=0, decl_uid=2739, cgraph_uid=1, symbol_order=0) int main () { int * y; int x; int D.2744; int _5; : # .MEM_2 = VDEF <.MEM_1(D)> x = 42; // First assignment to var_decl x y_3 = &x; # .MEM_4 = VDEF <.MEM_2> x = 6; // Second assignment to var_decl x # VUSE <.MEM_4> _5 = x; # .MEM_6 = VDEF <.MEM_4> x ={v} {CLOBBER(eol)}; : : # VUSE <.MEM_6> return _5; } ''' The thing is, there is two distinct assignment to the same LHS tree at two different gimple statement, which is by definition not supposed to happened in SSA form. Is there any particular reason this happen? Is that because the address of x is taken and stored? I have to precise, I did not dig into the SSA form transformation and am a newbie to gcc source code. So maybe my question is a bit naive or a known issue. Thank you for your time, Pierrick
Re: [Tree-SSA] Question from observation, bogus SSA form?
On 16/03/2023 17:30, Martin Jambor wrote: Hello Pierrick, On Thu, Mar 16 2023, Pierrick Philippe wrote: Hi everyone, I was working around with the analyzer, but I usually dump the SSA-tree to get a view of the analyzed code. This is how I noticed something wrong, at least in the sense of the definition of SSA form. please note that only some DECLs are put into a SSA form in GCC, these are sometimes referred to as "gimple registers" and you can query the predicate is_gimple_reg to figure out whether a DECL is one (putting aside "virtual operands" which are a special construct of alias analysis, are in an SSA form but the predicate returns false for them for some reason). This means that global variables, volatile variables, aggregates, variables which are not considered aggregates but are nevertheless partially modified (think insertion into a vector) or variables which need to live in memory (most probably because their address was taken) are not put into an SSA form. It may not be easily possible. Alright, I understand, but I honestly find it confusing. I mean, aren't they any passes relying on the pure SSA form properties to analyze code? For example to DSE or DCE. [stripping] The thing is, there is two distinct assignment to the same LHS tree at two different gimple statement, which is by definition not supposed to happened in SSA form. I think it is now clear that x is not in SSA form because (at this stage of the compilation) it is still considered to need to live in memory. I think that I get your point. It would be a heavy design change to be able to write it in pure SSA form. Is there any particular reason this happen? Is that because the address of x is taken and stored? I have to precise, I did not dig into the SSA form transformation and am a newbie to gcc source code. So maybe my question is a bit naive or a known issue. No worries, we know these important details are not straightforward when you see them for the first time. Good luck with your gcc hacking! Thanks! :) Pierrick
[Static Analyzer] Loop handling - False positive for malloc-sm
Hi everyone, I'm still playing around with the analyzer, and wanted to have a look at loop handling. I'm using a build from /trunk/ branch (/20230309/). Here is my analyzed code: ''' 1| #include 2| int main(void) { 3| void * ptr = malloc(sizeof(int)); 4| for (int i = 0; i < 10; i++) { 5| if (i == 5) free(ptr); 6| } 7|} ''' And here, the malloc-sm is reporting a double-free on line 5 with a quite confusing output: ''' ./test.c: In function ‘main’: ./test.c:5:21: warning: double-‘free’ of ‘ptr’ [CWE-415] [-Wanalyzer-double-free] 5 | if (i == 5) free(ptr); | ^ ‘main’: events 1-13 | | 3 | void * ptr = malloc(sizeof(int)); | | ^~~ | | | | | (1) allocated here | 4 | for (int i = 0; i < 10; i++) { | | ~~~ | | | | | | | (5) ...to here | | (2) following ‘true’ branch (when ‘i <= 9’)... | | (6) following ‘true’ branch (when ‘i <= 9’)... | | (9) following ‘true’ branch (when ‘i <= 9’)... | 5 | if (i == 5) free(ptr); | | ~ ~ | | | | | | | (8) first ‘free’ here | | | (12) ...to here | | | (13) second ‘free’ here; first ‘free’ was at (8) | | (3) ...to here | | (4) following ‘false’ branch (when ‘i != 5’)... | | (7) ...to here | | (10) ...to here | | (11) following ‘true’ branch (when ‘i == 5’)... | ''' So, I'm guessing that this false positive is due to how the analyzer is handling loops. Which lead to my question: how are loops handled by the analyzer? Thanks for your time, Pierrick
Re: [Static Analyzer] Loop handling - False positive for malloc-sm
On 21/03/2023 00:30, David Malcolm wrote: On Mon, 2023-03-20 at 13:28 +0100, Pierrick Philippe wrote: Hi everyone, I'm still playing around with the analyzer, and wanted to have a look at loop handling. I'm using a build from /trunk/ branch (/20230309/). Here is my analyzed code: ''' 1| #include 2| int main(void) { 3| void * ptr = malloc(sizeof(int)); 4| for (int i = 0; i < 10; i++) { 5| if (i == 5) free(ptr); 6| } 7|} ''' [stripping] So, I'm guessing that this false positive is due to how the analyzer is handling loops. Which lead to my question: how are loops handled by the analyzer? Sadly, the answer is currently "not very well" :/ I implemented my own approach, with a "widening_svalue" subclass of symbolic value. This is widening in the Abstract Interpretation sense, (as opposed to the bitwise operations sense): if I see multiple values on successive iterations, the widening_svalue tries to simulate that we know the start value and the direction the variable is moving in. This doesn't work well; arguably I should rewrite it, perhaps with an iterator_svalue, though I'm not sure how it ought to work. Some ideas: * reuse gcc's existing SSA-based loop analysis, which I believe can identify SSA names that are iterator variables, figure out their bounds, and their per-iteration increments, etc. * rework the program_point or supergraph code to have a notion of "1st iteration of loop", "2nd iteration of loop", "subsequent iterations", or similar, so that the analyzer can explore those cases differently (on the assumption that such iterations hopefully catch the most interesting bugs) I see, I don't know if you ever considered allowing state machines to deal with loops on their own. Such as having an API to allow to register a callback to handle loops, but not in a mandatory way. Or having a set of APIs to optionally implement for the analyzer to call. It would allow state machines to analyze loops with the meaning of their inner analysis. Which could allow them to try to find a fixed point in the loop execution which doesn't have any impact on the program state for that state machine. Kind of like a custom loop invariant. Because depending of the analysis goal of the state machine, you might need to symbolically execute the loop only a few times before reentering the loop and having the entry state being the same as the end-of-loop state. In fact, this could be done directly by the analyzer, and only calling state machine APIs for loop handling which still has not reached such a fixed point in their program state for the analyzed loop, with a maximum number of execution fixed by the analyzer to limit execution time. Does what I'm saying make sense? In terms of implementation, loop detection can be done by looking for strongly connected components (SCCs) in a function graph having more than one node. I don't know if this is how it is already done within the analyzer or not? Thank you for your time, Pierrick
Re: Static Analyzer: correlate state for different region/svalue for reporting
On 16/03/2023 14:44, David Malcolm wrote: On Thu, 2023-03-16 at 09:54 +0100, Pierrick Philippe wrote: On 15/03/2023 17:26, David Malcolm wrote: On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote: [stripping] So, first question: is there any way to associate and track the state of a rvalue, independently of its lvalue? To try to clarify the question, here's an example: ''' int __attribute__("__some_attribute__") x = 42; /* STEP-1 From now on, we consider the state of x as being marked by some_attribute. But in fact, in the log, we can observe that we'll have something like this in the new '/ana::program_state/': {0x4b955b0: (int)42: marked_state (‘x’)} */ [stripping] int *y = &x; /* STEP-2 For analysis purpose, you want to know that from now on, y is pointing to marked data. So you set state of the LHS of the GIMPLE statement (i.e. some ssa_name instance of y) accordingly, with a state named 'points- to_marked_data' and setting 'x' as the origin of the state (in the sense of the argument /origin/ from '/ana::sm_context::on_transition/'. What we now have in the new '/ana::program_state/' is this: {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ Yes: you've set the state on the svalue "&x", not on "y". int z = *y; /* STEP-3 Now you have implicit copy of marked data, and you want to report it. So you state the LHS of the GIMPLE statement (i.e. some ssa_name instance of z) as being marked, with 'y' as the origin. What we now have in the new '/ana::program_state/' is this: {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ ''' Presumably the program_state also shows that you have a binding for the region "z", containing the svalue 42 (this is represented within the "store", within the region_model within the program_state). [stripping] This is an update about tracking state of svalues instead of region for other kind of variables than pointers. If you consider the following code: ''' int __attribute__((__taint__)) x = 42; /* Program state: {0x4b955b0: (int)42: marked_state (‘x’)} */ int y = 42; // Program state unchanged if (y); /* When querying the sm_context about the state of y, it returns it as being in a "marked_state", because its svalue is the same as x's one. Even though no call to change y's state has been made. And here it triggers a diagnostic for my analysis. */ ''' I understand way better now the internals of the analyzer regarding the state's tracking. I do completely understand now, why you've said you've mainly designed it for pointers, because this allow you to avoid to do some points-to analysis, by associating state with pointer's svalues instead of pointer's region. But as you can see in the above code example, it has its drawback for analyzing variables with a different semantics, such as integer type variable. I will have to modified the analyzer's code to add a way for state machine to ask the analyzer to track region's state instead of svalue's state to be able to keep using it with my analysis plugin. Do you think it would be interesting having such features merged within the analyzer? In any case, I'll start to work on it over the last /trunk/ branch, within an appropriate branch. Thank you for your time, Pierrick
Re: [Static Analyzer] Loop handling - False positive for malloc-sm
On 22/03/2023 19:19, David Malcolm wrote: On Tue, 2023-03-21 at 09:21 +0100, Pierrick Philippe wrote: [stripping] In fact, this could be done directly by the analyzer, and only calling state machine APIs for loop handling which still has not reached such a fixed point in their program state for the analyzed loop, with a maximum number of execution fixed by the analyzer to limit execution time. Does what I'm saying make sense? I think so, though I'm not sure how it would work in practice. Consider e.g. for (int i = 0; i < n; i++) head = prepend_node (head, i); which builds a chain of N dynamically-allocated nodes in a linked list. Well, that would be a case where the loop's analysis will depend of the state machine. If we consider the malloc-sm, it would allow it to track as different pointers each allocated pointers, until the limit of symbolic execution imposed by the analyzer is reached, are the svalue of N if it is a known integer at the current analysis point. For other, such as a the file-sm, it would only be needed to symbolically execute it once, assuming prepend_node() is not opening any files. So this state machine would not have to be executed more than once on the loop at this program point by the analyzer. I think the different steps for such a different cases of loop analysis, is somehow using the second point of the RFE you shared above. The "algorithm" I came with when thinking about it looks like this. Of course, I'm definitely not an expert on the analyzer, so it is possibly not feasible. * Detect loop, and try to get the termination constraint (possibly reduced if possible). * Iterate on the loops' node N: * If N is the loop's first node: * Check if the actual program state is in a sufficient state to satisfy the loop's termination constraint, If so, stop analyzing the loop * Otherwise, check if the maximum number of symbolic execution fixed by the analyzer is reached, If so, stop analyzing the loop * Otherwise, keep going * Call every sm still impacting their program state map on node N This should work for loops iterating on integer, for other kind of loops, it might be trickier though. In terms of implementation, loop detection can be done by looking for strongly connected components (SCCs) in a function graph having more than one node. I don't know if this is how it is already done within the analyzer or not? It isn't yet done in the analyzer, but as noted above there is code in GCC that already does that (in cfgloop.{h,cc}). I definitely have to look at this files. Thank you for your time, Pierrick
[analyzer][tree] Get the adress of a specific tree
Hi all, I am working around array using a plugin to the analyzer. And I face a problem here, I would like to be able to build a transformed representation of an element of the array from its subscript representation to its address representation (vice versa). To image what I'm saying, I'm looking for a way to pass from 't[(int)index]' to '&t + index * sizeof(element)', and from '&t + offset' to 't[offset / sizeof(element)]'. From now, I'll use: - "subscript form" for the 't[index]' representation. - "address form" for the '&t + offset' representation. I tried to do it manually: - only using the analyzer API (using /ana::region_model_manager::get_or_create_binop/) to get the corresponding /svalue/ (i.e. address form) of a subscript form. - by first building the /tree/ in address form using GIMPLE API (/fold_buildn/) and using analyzer API (/ana::region_model::get_rvalue/) to get its corresponding /svalue./ I managed to do it from the /tree/ in the subscript form, except that I can't use it to check if this /svalue/ is tracked within the current /sm_state_map/ because it's hash would be different than from the one potentially within the map, although they are semantically equivalent. Is there by any chance an API within the analyzer which allow to go over the live svalues? Cheers, -- Pierrick
[analyzer] Comparing svalues
Hi David, hi all, I'm working on a plugin for the analyzer, and basically I've reached a point where I need to compare svalues. For the need of my analysis, I've modified the analyzer to be able to track for region in some specific cases, so I modified the implementation of the /sm_state_map/. If anyone want to see my modifications, I would be glad to send it to you (not yet on a public repository). I'm trying to handle all the different (only defined behavior) semantically correct ways to manipulate arrays. To illustrate my words, here is an example: int t[4] = 0; t[2] = some_var; // valid and represented in GIMPLE by a single gassign stmt with LHS being an ARRAY_REF *(t+2) = some_var; // valid and represented in GIMPLE by two distincts gassign stmt with LHS_1 being a SSA_NAME and LHS_2 being a MEM_REF int *y = t + 1; *(y+1) = some_var; // valid and represented in GIMPLE by two distincts gassign stmt with LHS_1 being a SSA_NAME and LHS_2 being a MEM_REF In this example, the same memory is modified and correspond to 't[2]'. What I'm trying to do is to determine have a correlation between the region 't[2]' and the svalue '&t + 2 * sizeof(element)'. I've manage to pass from the tree '&t + 2 * sizeof(element)' to the corresponding region 't[2]' using the /ana::region_model_manager::get_element_region/ API. So that if I have the region corresponding to 't[2]' in the /sm_state_map/, it is correctly found within the inner /hash_map//region *, reg_entry_t>/. It gets weird when working from going to the tree 't[2]' to the corresponding svalue '&t + 2 * sizeof(element)'. Basically for now, I used several approaches: - I tried building the correspond tree using /buildN/ GIMPLE API and then the /ana::region_model::get_rvalue/ API, I did had a result being dumped as exactly what I needed, but the lookup (through /ana::sm_context::get_state/) within the inner /hash_map / of /sm_state_map/ was failing even though the same svalue was present in the /hash_map. /I tried to understand what was happening, and basically, it seems that the two svalues does not have the same address, though the same hash, leading to the lookup failure. - Right now, I am doing exactly the same to obtain the corresponding svalue, but instead of using /ana::sm_context::get_state/, I am iterating over all the live_values obtained through /ana::region_model::get_reachable_svalues/ until I find the same svalue in terms of semantics. Though, this is failing because there is currently no way to compare svalue's semantic. So, basically I'm kind of stuck here and I have no idea how to properly go from a tree representation to its svalue/region one. To explicit as much as possible I'm trying to do this: - Pass from 'tree t[2]' to 'svalue &t + 2 * sizeof(element)'; -> that part does not work - Pass from 'tree t + 2' to 'region t[2]'; -> that part is working Would you have any idea about an API I would have missed or anything else? I can definitely share my code if anyone want to have a look at it. Thanks for reading, Cheers, Pierrick
[gimple-ssa] Get the gimple corresponding to the definition of a VAR_DECL
Hi everyone, I'm trying to get the gimple * associated to the definition of a given var_decl. Basically, I am iterating over the locals of a function (through the local_decls member) and I need to be able to get the gimple * of its definition within the function's gimple_seq. Does any of you have an idea on how this could be achieved ? Thanks, Pierrick
Re: [gimple-ssa] Get the gimple corresponding to the definition of a VAR_DECL
On 27/06/2023 11:42, Richard Biener wrote: On Tue, Jun 27, 2023 at 11:36 AM Pierrick Philippe wrote: Hi everyone, I'm trying to get the gimple * associated to the definition of a given var_decl. Basically, I am iterating over the locals of a function (through the local_decls member) and I need to be able to get the gimple * of its definition within the function's gimple_seq. Does any of you have an idea on how this could be achieved ? You need to walk the body of the function looking for defining stmts. Note the VAR_DECL might be in SSA form in which case you can also walk all SSA names and check those whose SSA_NAME_VAR is the specific VAR_DECL. But then SSA names can "lose" their associated decks so when a variable is in SSA form the set of original assignments cannot always be recovered easily. Thank you for your answer Richard. I'll have a look into that. My main problem is regarding uninitialized definition, but still not being considered undefined behavior. For example, the following code: int x; int *y = &x; *y = 6; What I'm doing is basically looking at each gimple statement if the lhs has a given attribute for the purpose of the analysis I'm trying to perform. To precise, I'm plugged into the analyzer, so an out-of-tree plugin. But in the example above, the definition of x is not really within the gimple_seq of the function as it is never directly assigned. Pierrick Richard. Thanks, Pierrick
[Analyzer] Transform a region 't[i]' to its svalue form '&t + (i * element_size)'
Hi all, hi David, I am trying to use the analyzer API to transform an element_region (such as 't[1]') to a binop_svalue with an inner region_svalue (such as '&t + 4' in case of integers array) for analysis purpose. I managed to do it the other way around (i.e. from a binop_svalue to an element_region) using the analyzer API (code is in attached file offset_to_elm.cc). And everything is working as intended within the analyzer and my analysis. The problem I'm having here is probably due to a mistake I'm doing on some argument to a function call, but I cannot see it to be honest (code misbehaving is in elm_to_offset.cc). I attached a commented sample code I'd like to be able to analyze (test.c), alongside the commented GIMPLE code seen by the analyzer (test.c.075i.analyzer obtained through '-fdump-ipa-analyzer'). Feel free to ask question if needed of course. I manage to rebuild a binop_sval from an element_region (code is in attached file elm_to_offset.cc), but resulting object is not the same than the one already within the state_map of my state machine. To be clear, the svalue's pointer resulting from the code in elm_to_offset.cc is deifferent, though when they're logged with simple set to false, no difference seem to exist. And I really do have the intuition that the problem might be related to the call to region_model_manager::get_ptr_svalue on elm_to_offset.cc:17. I do think that the key is not find within region_model_manager::m_pointer_values_map is not found and the call to region_svalue constructor is performed within region_model_manager::get_ptr_svalue implementation. But still, I have no idea why. I did modified the analyzer state_map class to be able to also track regions states alongside svalues one (working on a patch, code is not respecting GCC's coding standard so far and most probably not optimized). Any idea on what I'm doing wrong here would really be appreciated. Beside this, I do think that enhancing the analyzer by allowing to track regions states could allow for a broader set of possible analysis. What do you think ? Thank you, Pierrick P.S.: @David, sorry for double send. I did a mistake in the gcc's mailing list address.const svalue *elm_to_offset(sm_context *sm_ctx, const region *reg, logger *logger) { const svalue* res = nullptr; if (logger) LOG_SCOPE(logger); auto model = sm_ctx->get_new_program_state()->m_region_model; auto mgr = model->get_manager(); bit_offset_t offset; const svalue *offset_sval = nullptr; if (reg->get_relative_concrete_offset(&offset)) { auto offset_byte = offset / BITS_PER_UNIT; offset_sval = mgr->get_or_create_constant_svalue(wide_int_to_tree(sizetype, offset_byte)); auto base = reg->get_base_region(); auto base_sval = mgr->get_ptr_svalue(TYPE_POINTER_TO(base->get_type()), base); res = mgr->get_or_create_binop(base->get_type(), POINTER_PLUS_EXPR, base_sval, offset_sval); } // TODO: symbolic offset else if (logger) logger->log("Offset is symbolic, not implemented."); if (logger) { logger->start_log_line(); logger->log_partial("res: %p | ", res); res ? res->dump_to_pp(logger->get_printer(), false) : logger->log_partial("nullptr"); logger->log_partial(" | "); res ? res->dump_to_pp(logger->get_printer(), true) : logger->log_partial("nullptr"); logger->end_log_line(); } return res; } const region *offset_to_elm(sm_context * sm_ctx, const svalue *sval, logger *logger) { const region * res = nullptr; if (logger) LOG_SCOPE(logger); auto model = sm_ctx->get_new_program_state()->m_region_model; if (const region *deref = model->deref_rvalue(sval, NULL_TREE, nullptr)) { if (logger) { logger->start_log_line(); logger->log_partial("deref: "); deref->dump_to_pp(logger->get_printer(), false); logger->end_log_line(); } auto base = deref->get_base_region(); bit_offset_t offset; bit_size_t size; if (deref->get_relative_concrete_offset(&offset) && deref->get_bit_size(&size)) { auto index = offset / size; auto model_mgr = model->get_manager(); auto index_sized = wide_int_to_tree(sizetype, index); auto sval_index_sized = model_mgr->get_or_create_constant_svalue(index_sized); res = model_mgr->get_element_region(base, TREE_TYPE(base->get_type()), sval_index_sized); } // TODO: symbolic offset else if (logger) logger->log("Offset is symbolic, not implemented."); } if (logger) { logger->start_log_line(); logger->log_partial("input: %p | ", sval); sval->dump_to_pp(logger->get_printer(), true); logger->log_partial(" | "); sval->dump_to_pp(logger->get_printer(), false); logger->end_log_line(); logger->start_log_line(); logger->log_partial("res: "); res ? res->dump_to_pp(logger->get_printer(), true) : logger->log_partial("nullptr"); logger->end_log_line(); } return res; } void some_func(void)
[Tree][Static Analyzer] Tree representing types and svalues type
Hi David, hi all, I was playing along with APIs from the Static Analyzer and encountered a segfault in gcc/tree.cc:5068 (i.e. in function build2 and failure is due to a gcc_assert call), after a call to ana::region_model::get_representative_tree. >From my debugging of the problem, I realized that the svalue which I built with the SA API as the following from an element_region (C sample): const ana::region *base = elm_reg->get_base_region(); // base_ptr is the svalue passed to get_representative_tree const ana::svalue *base_ptr = base->get_ptr_svalue( TYPE_POINTER_TO(base->get_type), base); I realized that the SA resulting in the call to get_ptr_svalue has NULL_TREE as type (return by get_type on base_ptr). And this is because TYPE_POINTER_TO (base->get_type) is returning a NULL_TREE. I found my way around by calling build_pointer_type(base->get_type()) which is currently building the expecting type. But from this, two questions raised in my mind: 1. Is it coherent for the ana::region_model_manager::get_ptr_svalue to return a sval with a null type? 2. Can a tree view as a tree_type_common member have an uninitialized pointer_to member? I think, but definitely not sure) that the member is not initialized here by some part of the compiler, because I think that the inner type of regions and svalues are build by the SA by using the TREE_TYPE macro directly from the GIMPLE. I ask this question to know if this is a bug and I should try to do a minimal example out of it, or if it is a intended behavior for any reason from the compiler. Thank you for reading me, Pierrick
Re: [Tree][Static Analyzer] Tree representing types and svalues type
On 17/01/2024 23:52, David Malcolm wrote: > On Tue, 2024-01-16 at 15:44 +0100, Pierrick Philippe wrote: >> Hi David, hi all, > Hi Pierrick. First, thanks for you answer. [stripping] > I confess that I've been quite sloppy in places with types in the > analyzer, keeping track of them when it's easy, but letting them be > NULL in places where getting a type was hard, or apparently not > possible. > > Sorry about this. > > I think my implementation is a bit muddled about what an svalue > actually is. > > Informally, I think of an svalue as a partial function from bit offsets > to values of the bits. For example: > * region_svalue: e.g. bit offsets 0-63 are the pointer to REG; > undefined elsewhere > * constant_svalue: e.g. given (int)42, bit offsets 0-31 are the > constant 42, encoded in the target's usual way; undefined elsewhere > * unknown_svalue: undefined everywhere > ...etc, and the above definition doesn't need svalues to have a type. > > The above is rather vague: how should this work for unaryop_svalue and > binop_svalue; these need to have types, so that we can think of them as > e.g. > * "bits 0-31 are the result of a signed 32-bit int addition of the > values of bits 0-32 from X and Y when considered as signed 32-bit int". > * "bits 0-31 are the result of zero-extending bits 0-7 of X and > treating the result as uint32_t". I do understand, but here, the thing is that the analyzer built a region_svalue without any type. And thought it was a bit weird. But I do get your point here. > In the meantime, you can't rely on an svalue having a non-NULL type, > and hence it's often not possible to get a representative tree from an > svalue, since tree expressions must have non-NULL types. But then, shouldn't the analyzer have a defensive approach in get_representative_tree and return a NULL_TREE in such cases? Because the segfault is happening behind the scene. Maybe it shouldn't, I have no idea what would be the best approach here. >> But from this, two questions raised in my mind: >> >> 1. Is it coherent for the ana::region_model_manager::get_ptr_svalue >> to >> return a sval with a null type? > It depends what you mean by "coherent"; given the sloppy approach > above, the answer is "yes". (nod) >> 2. Can a tree view as a tree_type_common member have an uninitialized >> pointer_to member? > It shouldn't be uninitialized. Perhaps the tree node isn't the right > kind? You should use the TYPE_POINTER_TO macro to access the field, > which will use TYPE_CHECK to check that the node is a type in a debug > build of the compiler. Well, I'm pretty sure I didn't build it myself and only used macros and API functions to get that type. And this is what is confusing me. [stripping] > Sounds like a bug; please try to construct a minimal example. Sure, I'll try to build a minimal example. Will have to use a plugin to do so though, hope that won't be a problem. Thanks, Pierrick > Thanks > Dave > >> Thank you for reading me, >> >> Pierrick
Using std types and functions within GCC
Hi all, I was wondering, is there any conventions or guidelines regarding the usage of types and/or functions coming from the C++ std library within the compiler? To explicit a bit more, I am working on modification on the analyzer and might need to use a pair. Thank you for your time, Pierrick
[gimple-ssa] result_decl and ssa_name
Hi all, I do have a question regarding ssa_name and result_decl. For example on the following gimple function: int f () { int x; int D.2747; int _2; : x_1 = 42; _2 = x_1; : : return _2; } On the above example, using the macro SSA_NAME_VAR() on _2 does not yield anything usable. Neither to call ssa_default_def() on the result of the result_decl obtain through macro DECL_RESULT(). Is there a way to get the ssa_name corresponding to the result_decl of a function obtained through the use of macro DECL_RESULT() on a fn_decl? And/or the other way around? I.e., from the returned ssa_name of a function to the result_decl of that function? I totally might be missing something here, but I cannot figure out what. Thanks for your time, Pierrick
Re: [gimple-ssa] result_decl and ssa_name
On 05/04/2024 14:46, Richard Biener wrote: > On Fri, Apr 5, 2024 at 1:59 PM Pierrick Philippe > wrote: >> Hi all, >> >> I do have a question regarding ssa_name and result_decl. >> >> For example on the following gimple function: >> >> int f () >> { >> int x; >> int D.2747; >> int _2; >> >>: >> x_1 = 42; >> _2 = x_1; >> >>: >> : >> return _2; >> >> } >> >> On the above example, using the macro SSA_NAME_VAR() on _2 does not >> yield anything usable. >> Neither to call ssa_default_def() on the result of the result_decl >> obtain through macro DECL_RESULT(). >> >> Is there a way to get the ssa_name corresponding to the result_decl of a >> function obtained through the use of macro DECL_RESULT() on a fn_decl? >> And/or the other way around? I.e., from the returned ssa_name of a >> function to the result_decl of that function? >> >> I totally might be missing something here, but I cannot figure out what. > DECL_RESULT isn't always used (as in your example). Not all SSA names > have corresponding declarations, we have "anonymous" SSA names which > have a NULL_TREE SSA_NAME_VAR (such as the _2 in your example). I see, that makes so much more sense to me now. > What do you try to find in the end? If you want to find all returns you can > walk predecessors of EXIT_BLOCK and look at their last stmt whether they > are greturn statements. I am implementing a state_machine within the analyzer, and I am trying to understand where would be the best place to propagate the state of the return value. I intuitively thought it would be best to do so in the state_machine::on_pop_frame() method, which is called by the analyzer between the two frames of the caller and the callee. What I do have access to is the struct function of the callee/caller, the gcall instruction in the caller and the callee have been processed by my analysis. And to illustrate, here I do have the _2 ssa_name and its state which I know in that case should be propagate to the lhs of the caller gcall instruction. Again I might be taking this in a wrong way. > Richard. >> Thanks for your time, >> >> Pierrick
Re: [gimple-ssa] result_decl and ssa_name
On 06/04/2024 14:53, Richard Biener wrote: > On Fri, Apr 5, 2024 at 3:44 PM Pierrick Philippe > wrote: >> On 05/04/2024 14:46, Richard Biener wrote: >> >> On Fri, Apr 5, 2024 at 1:59 PM Pierrick Philippe >> wrote: >> >> Hi all, >> >> I do have a question regarding ssa_name and result_decl. >> >> For example on the following gimple function: >> >> int f () >> { >> int x; >> int D.2747; >> int _2; >> >>: >> x_1 = 42; >> _2 = x_1; >> >>: >> : >> return _2; >> >> } >> >> On the above example, using the macro SSA_NAME_VAR() on _2 does not >> yield anything usable. >> Neither to call ssa_default_def() on the result of the result_decl >> obtain through macro DECL_RESULT(). >> >> Is there a way to get the ssa_name corresponding to the result_decl of a >> function obtained through the use of macro DECL_RESULT() on a fn_decl? >> And/or the other way around? I.e., from the returned ssa_name of a >> function to the result_decl of that function? >> >> I totally might be missing something here, but I cannot figure out what. >> >> DECL_RESULT isn't always used (as in your example). Not all SSA names >> have corresponding declarations, we have "anonymous" SSA names which >> have a NULL_TREE SSA_NAME_VAR (such as the _2 in your example). >> >> I see, that makes so much more sense to me now. >> >> What do you try to find in the end? If you want to find all returns you can >> walk predecessors of EXIT_BLOCK and look at their last stmt whether they >> are greturn statements. >> >> I am implementing a state_machine within the analyzer, and I am trying to >> understand where would be the best place to propagate the state of the >> return value. >> I intuitively thought it would be best to do so in the >> state_machine::on_pop_frame() method, which is called by the analyzer >> between the two frames of the caller and the callee. What I do have access >> to is the struct function of the callee/caller, the gcall instruction in the >> caller and the callee have been processed by my analysis. > It might make sense to record the analysis of the return value in > meta-data that the analyzer keeps and access it that way. > Other than that you'd have to do it the way I said with finding the > greturn stmts again and look at what is returned there. That is what I had in mind, thanks for answering me. I do have another question though, how do you obtain the decl_context of such an ssa_name? The DECL_CONTEXT macro is failing during the tree check and I ha ve no idea how to know where a given ssa_name is declared without accessing its inner var (through SSA_NAME_VAR). And this operation is failing on _i ssa_name trees. >> And to illustrate, here I do have the _2 ssa_name and its state which I know >> in that case should be propagate to the lhs of the caller gcall instruction. >> >> Again I might be taking this in a wrong way. >> >> Richard. >> >> Thanks for your time, >> >> Pierrick