Patch for PR analyzer/98797
The attached patch has been bootstrapped and regression tested. It addresses PR analyzer/98797, which is built off the expected failures in gcc/testsuite/gcc.dg/analyzer/casts-1.c It fixes those failures and additional manifestations. That testsuite file has been updated to no longer expect failures and to include additional tests addressing the other manifestations I found. Some additional work can probably be done to handle further cases, but this is now substantially more robust than the current status. Thanks.commit f3654c5a322f241c8cc551f88257a495689ed9d9 Author: Brian Sobulefsky Date: Tue Feb 9 16:25:43 2021 -0800 Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, as recorded by the XFAIL, and some simpler and more complex versions found during the investigation of it. PR analyzer/98797 was opened for these bugs. The simplest manifestation of this bug can be replicated with: char arr[] = {'A', 'B', 'C', 'D'}; char *pt = ar; __analyzer_eval(*(pt + 0) == 'A'); __analyzer_eval(*(pt + 1) == 'B'); //etc The result of the eval call is "UNKNOWN" because the analyzer is unable to determine the value at the pointer. Note that array access (such as arr[0]) is correctly handled. The code responsible for this is in file region-model-manager.cc, function region_model_manager::maybe_fold_sub_svalue. The relevant section is commented /* Handle getting individual chars from STRING_CST */. This section only had a case for an element_region. A case needed to be added for an offset_region. Additionally, when the offset was 0, such as in *pt or *(pt + 0), the function region_model_manager::get_offset_region was failing to make the needed offset region at all. This was due to the test for a constant 0 pointer that was then returning get_cast_region. A special case is needed for when PARENT is of type array_type whose type matches TYPE. In this case, get_offset_region is allowed to continue to normal conclusion. The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for the case: struct s1 { char a; char b; char c; char d; }; struct s2 { char arr[4]; }; struct s2 x = {{'A', 'B', 'C', 'D'}}; struct s1 *p = (struct s1 *)&x; __analyzer_eval (p->a == 'A'); //etc This requires a case added to region_model_manager::maybe_fold_sub_svalue in the individual characters from string constant section for a field region. Finally, the prior only works for the case where struct s2 was a single field struct. The more general case is: struct s2 { char arr1[2]; char arr2[2]; }; struct s2 x = {{'A', 'B'}, {'C', 'D'}}; Here the array will never be found in the get_any_binding routines. A new routine is added to class binding_cluster that checks if the region being searched is a field_region of a cast_region, and if so, tries to find a field of the original region that contains the region under examination. This new function is named binding_cluster::get_parent_cast_binding. It is called from get_binding_recursive. gcc/analyzer/ChangeLog: PR analyzer/98797 * region-model-manager.cc region_model_manager::get_offset_region: Add check for a PARENT array whose type matches TYPE, and have this skip returning get_cast_region and rather conclude the function normally. * region-model-manager.cc region_model_manager::maybe_fold_sub_svalue Update the get character from string_cst section to include cases for an offset_region and a field_region. * store.cc binding_cluster::get_binding_recursive: Add case for call to new function get_parent_cast_binding. * store.cc binding_cluster::get_parent_cast_binding: New function. * store.h class binding_cluster: Add declaration for new member function get_parent_class_binding and macros for bit to byte conversion. gcc/testsuite/ChangeLog: PR analyzer/98797 * gcc.dg/analyzer/casts-1.c: Update file to no longer expect failures and add test cases for additional bugs solved. diff --git a/gcc/analyzer/region-model-manager.cc b/gcc/analyzer/region-model-manager.cc index dfd2413e914..9aee4c498c6 100644 --- a/gcc/analyzer/region-model-manager.cc +++ b/gcc/analyzer/region-model-manager.cc @@ -602,16 +602,46 @@ region_model_manager::maybe_fold_sub_svalue (tree type, /* Handle getting individual chars from a STRING_CST. */ if (tree cst = parent_svalue->maybe_get_constant ()) if (TREE_CODE (cst) == STRING_CST) - if (const element_region *element_reg + { + if (const element_region *element_reg = subregion->dyn_cast_elem
Re: Patch for PR analyzer/98797
I apologize. I left a dangling space in the patch. I am guessing that this is against whitespace policy. I had actually corrected it but forgot to add the file to the commit. Please see the attached.commit a9b60f36b499463b49095f28ce8053f6387c506d Author: Brian Sobulefsky Date: Tue Feb 9 16:25:43 2021 -0800 Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, as recorded by the XFAIL, and some simpler and more complex versions found during the investigation of it. PR analyzer/98797 was opened for these bugs. The simplest manifestation of this bug can be replicated with: char arr[] = {'A', 'B', 'C', 'D'}; char *pt = ar; __analyzer_eval(*(pt + 0) == 'A'); __analyzer_eval(*(pt + 1) == 'B'); //etc The result of the eval call is "UNKNOWN" because the analyzer is unable to determine the value at the pointer. Note that array access (such as arr[0]) is correctly handled. The code responsible for this is in file region-model-manager.cc, function region_model_manager::maybe_fold_sub_svalue. The relevant section is commented /* Handle getting individual chars from STRING_CST */. This section only had a case for an element_region. A case needed to be added for an offset_region. Additionally, when the offset was 0, such as in *pt or *(pt + 0), the function region_model_manager::get_offset_region was failing to make the needed offset region at all. This was due to the test for a constant 0 pointer that was then returning get_cast_region. A special case is needed for when PARENT is of type array_type whose type matches TYPE. In this case, get_offset_region is allowed to continue to normal conclusion. The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for the case: struct s1 { char a; char b; char c; char d; }; struct s2 { char arr[4]; }; struct s2 x = {{'A', 'B', 'C', 'D'}}; struct s1 *p = (struct s1 *)&x; __analyzer_eval (p->a == 'A'); //etc This requires a case added to region_model_manager::maybe_fold_sub_svalue in the individual characters from string constant section for a field region. Finally, the prior only works for the case where struct s2 was a single field struct. The more general case is: struct s2 { char arr1[2]; char arr2[2]; }; struct s2 x = {{'A', 'B'}, {'C', 'D'}}; Here the array will never be found in the get_any_binding routines. A new routine is added to class binding_cluster that checks if the region being searched is a field_region of a cast_region, and if so, tries to find a field of the original region that contains the region under examination. This new function is named binding_cluster::get_parent_cast_binding. It is called from get_binding_recursive. gcc/analyzer/ChangeLog: PR analyzer/98797 * region-model-manager.cc region_model_manager::get_offset_region: Add check for a PARENT array whose type matches TYPE, and have this skip returning get_cast_region and rather conclude the function normally. * region-model-manager.cc region_model_manager::maybe_fold_sub_svalue Update the get character from string_cst section to include cases for an offset_region and a field_region. * store.cc binding_cluster::get_binding_recursive: Add case for call to new function get_parent_cast_binding. * store.cc binding_cluster::get_parent_cast_binding: New function. * store.h class binding_cluster: Add declaration for new member function get_parent_class_binding and macros for bit to byte conversion. gcc/testsuite/ChangeLog: PR analyzer/98797 * gcc.dg/analyzer/casts-1.c: Update file to no longer expect failures and add test cases for additional bugs solved. diff --git a/gcc/analyzer/region-model-manager.cc b/gcc/analyzer/region-model-manager.cc index dfd2413e914..1fd6b94f20a 100644 --- a/gcc/analyzer/region-model-manager.cc +++ b/gcc/analyzer/region-model-manager.cc @@ -602,16 +602,46 @@ region_model_manager::maybe_fold_sub_svalue (tree type, /* Handle getting individual chars from a STRING_CST. */ if (tree cst = parent_svalue->maybe_get_constant ()) if (TREE_CODE (cst) == STRING_CST) - if (const element_region *element_reg + { + if (const element_region *element_reg = subregion->dyn_cast_element_region ()) - { - const svalue *idx_sval = element_reg->get_index (); - if (tree cst_idx = idx_sval->maybe_get_constant ()) - if (const svalue *char_sval - = maybe_get_char_from_string_cst (cst, cst_idx)) - return get_or_create_cast (type, char_sval); - } - + { + const svalue *idx_sval = element_reg->get_index (); + if (tree cst
Re: Patch for PR analyzer/98797
Hi, answers below. Note, do you want my resubmitted patch to be with commit --amend, and therefore relative to "real" commits in the history, or do you want me to chain it onto the last submission? I have already sent to assign at gnu for the form. I will update the commit message to call it s3 now. The case did not arise "in practice." After solving the xfail from the test suite, I tried dividing the array to have more than one field and found that this still did not work, so I tracked down why. I would argue that the case of a struct made of only a single array as its one field is a lot less likely to arise in practice. For what its worth, the single field example basically works "for the wrong reason." When get_binding_recursive goes up the parent chain, it finds a binding in the parent cast region because that cast is a struct made of only one field, and so its binding_key is the same as that field. The "right" logic would check for a field covering the desired region. I'll fix the ChangeLog formatting parentheses. I found the formatting script, but I did not have one of the imported python files so I just copied others. I can add the helper routine you have requested. Where would you like it, I am thinking analyzer.h so it is visible basically everywhere? I can add the check for size matching, which helper routine is easiest. Yes, in the event of *p == 'A', *p, having a constant zerop offset would get sent to get_cast_region, where it would become a cast region. This defeats the new logic in maybe_fold_sub_svalue. I think in the case where we are pointing to an array of the requested type, it is much more correct to return the offset region with a zero offset, as arr[0] should not be different than arr[1]. Sometimes the 0 tree would be of pointer_type, and I was not sure if this would defeat it or not, so I made sure it was of integer_type. This may just be a matter of my being new and not knowing for sure how everything works and so erring on the side of safety. As of now, the routine immediately rejects any case other than where reg is a field_region and parent is a cast_region. I will think if there is a C like syntax for the function, really it is checking if the original form of parent had a field covering the requested field. I will remove the first guard, leave the second, and try to reformat the latters into a similar rejection style. Currently, get_field_at_bit_offset is not an externally visible function. I am taking your instruction to reuse it to mean a prototype should be added to the relevant header (it would be region.h I think, as long as both region-model-manager.cc and store.cc include those). As you know, I am very new to gcc and so was happy when I could hack my way to this working at all. I already assumed my direct access was not correct. Most of what I did is based on "RTFS" since the manual does not really cover the right way to do these things. I will try the routine or otherwise macro you say. I will change the testfile to leave pre exisiting lines in place. Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Friday, February 12, 2021 4:16 PM, David Malcolm wrote: > On Wed, 2021-02-10 at 19:42 +, brian.sobulefsky wrote: > > Hi Brian > > Thanks for the patch. > > The patch is large enough to count as legally significant; I've sent > you information on copyright assignment separately; the patch can't be > committed until that paperwork is in place. > > In the meantime, I've added some review notes inline below throughout: > > > Address the bug found in test file > > gcc/testsuite/gcc.dg/analyzer/casts-1.c, as > > recorded by the XFAIL, and some simpler and more complex versions found > > during > > the investigation of it. PR analyzer/98797 was opened for these bugs. > > > > The simplest manifestation of this bug can be replicated with: > > > > char arr[] = {'A', 'B', 'C', 'D'}; > > char *pt = ar; > > __analyzer_eval(*(pt + 0) == 'A'); > > __analyzer_eval(*(pt + 1) == 'B'); > > //etc > > > > The result of the eval call is "UNKNOWN" because the analyzer is unable > > to > > determine the value at the pointer. Note that array access (such as > > arr[0]) is > > correctly handled. The code responsible for this is in file > > region-model-manager.cc, function > > region_model_manager::maybe_fold_sub_svalue. > > The relevant section is commented /* Handle getting individual chars > > from > > STRING_CST */. This section only had a case for an element_region. A > > case > > needed to be added for an offset_region. > > > > Additionally, when the offset was 0, such as in *pt or *(pt + 0), the > > function > > region_model_manager::get_offset_region was failing to make the needed > > offset > > region at all. This was due to the test for a constant 0 pointer that > > was then > > returning get_cast_region. A special case is needed for when PARENT is > > of t
Re: Patch for PR analyzer/98797
David, Two items I needs answered to get you the patch. 1) The get_field_at_bit_offset function is static in region.cc, so I cannot use it outside of the file (you want me to use it in store.cc) without updating that definition to extern. I am assuming you want this. 2) I am updating my direct access of the tree fields. I am using int_bit_position for the position of a field in a struct as you requested. What do I use for the size of the struct. I had been using ->decl_common.size->int_cst[0]. Brian Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Saturday, February 13, 2021 9:53 AM, brian.sobulefsky wrote: > Hi, answers below. Note, do you want my resubmitted patch to be with > commit --amend, and therefore relative to "real" commits in the history, > or do you want me to chain it onto the last submission? > > I have already sent to assign at gnu for the form. > > I will update the commit message to call it s3 now. > > The case did not arise "in practice." After solving the xfail from the > test suite, I tried dividing the array to have more than one field and > found that this still did not work, so I tracked down why. I would > argue that the case of a struct made of only a single array as its one > field is a lot less likely to arise in practice. For what its worth, > the single field example basically works "for the wrong > reason." > When get_binding_recursive goes up the parent chain, it finds > a binding > in the parent cast region because that cast is a struct made of > only one > field, and so its binding_key is the same as that field. The > "right" > logic would check for a field covering the desired region. > > I'll fix the ChangeLog formatting parentheses. I found the formatting > script, but I did not have one of the imported python files so I just copied others. > > I can add the helper routine you have requested. Where would you like it, > I am thinking analyzer.h so it is visible basically everywhere? > > I can add the check for size matching, which helper routine is easiest. > > Yes, in the event of *p == 'A', *p, having a constant zerop offset > would get sent to get_cast_region, where it would become a cast region. > This defeats the new logic in maybe_fold_sub_svalue. I think in the case > where we are pointing to an array of the requested type, it is much more > correct to return the offset region with a zero offset, as arr[0] should not > be different than arr[1]. Sometimes the 0 tree would be of pointer_type, > and I was not sure if this would defeat it or not, so I made sure it was > of integer_type. This may just be a matter of my being new and not > knowing for sure how everything works and so erring on the side of safety. > > As of now, the routine immediately rejects any case other than where > reg is a field_region and parent is a cast_region. I will think if there > is a C like syntax for the function, really it is checking if the original > form of parent had a field covering the requested field. > > I will remove the first guard, leave the second, and try to reformat > the latters into a similar rejection style. > > Currently, get_field_at_bit_offset is not an externally visible function. > I am taking your instruction to reuse it to mean a prototype should > be added to the relevant header (it would be region.h I think, as long as > both region-model-manager.cc and store.cc include those). > > As you know, I am very new to gcc and so was happy when I could hack my > way to this working at all. I already assumed my direct access was not > correct. Most of what I did is based on "RTFS" since the manual does not > really cover the right way to do these things. I will try the routine > or otherwise macro you say. > > I will change the testfile to leave pre exisiting lines in place. > > Sent with ProtonMail Secure Email. > > ‐‐‐ Original Message ‐‐‐ > On Friday, February 12, 2021 4:16 PM, David Malcolm dmalc...@redhat.com wrote: > > > On Wed, 2021-02-10 at 19:42 +, brian.sobulefsky wrote: > > Hi Brian > > Thanks for the patch. > > The patch is large enough to count as legally significant; I've sent > > you information on copyright assignment separately; the patch can't be > > committed until that paperwork is in place. > > In the meantime, I've added some review notes inline below throughout: > > > > > Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, as > > > recorded by the XFAIL, and some simpler and more complex versions found during > > > the investigation of it. PR analyzer/98797 was opened for these bugs. > > > > > > The simplest manifestation of this bug can be replicated with: > > > > > > char arr[] = {'A', 'B', 'C', 'D'}; > > > char *pt = ar; > > > __analyzer_eval(*(pt + 0) == 'A'); > > > __analyzer_eval(*(pt + 1) == 'B'); > > > //etc > > > > > > The result of the eval call is "UNKNOWN" because the analyzer is unable to > > > determine the va
Re: Patch for PR analyzer/98797
Hi David, I'm sorry but I wanted to get this out to you. To clarify, I had a statement of the form: end_offset <= covering_field->field_decl.bit_offset->int_cst.val[0] covering_field->decl_common.size->int_cst.val[0] - 1; (Sorry if my email is clobbering the angle brackets). I have replaced the first expression with int_bit_position (covering_field), I am not sure where to properly access the size of the field. FWIW, I found your region::get_byte_size, which calls int_size_in_bytes, but this crashes gcc for a field tree, it wants a type tree. Additionally, is there some proper way to access a bit_offset_t other than bit_off.get_val ()[0]? That is what I am using now, but I can swap it out. Sorry for the newbie questions, but these things aren't really documented in one place, at least that I am aware of. Brian
Re: Patch for PR analyzer/98797
Hi David, Please find my revised patch attached. You should find all items addressed. I have bootstrapped and rerun the testsuite (gcc and g++ only). Also, you should have seen my legal paperwork go in. 1) The commit message has been updated to reflect the nomenclature in the testsuite, as well as include the required parentheses. 2) The commit message has been updated to reflect the other changes you requested below. 3) My macros and inline code for "bit_byte_conversion" have been replaced with the subroutine you requested. It works by modulo and division rather than bitwise-and and bitshift, as requested. Without any specific instructions, I put the routine in analyzer.cc and the prototype in analyzer.h, as it provides a generic helper service. You can move it anywhere else you might want it. 4) I added the check for the field to be sizeof (char). This is actually a cause to reject getting a char from a string constant for any case, so I added the check near the outside of the nested ifs. Also, as a side note, it looks to me like your example: void test_4 () { const char *s = "ABCD"; __analyzer_eval (*(short *)s == 'A'); } is unrelated to my patch, as I remember a char pointer to a string constant is handled as a special case elsewhere and does not end up in maybe_fold_sub_svalue. 5) I updated the modification to get_offset_region so that my new code is only run in the case of a zerop. This was really the failure issue anyway. I still have it building a new constant svalue of zero, because I do not know for sure if the zero pointer type tree and zero integer type tree are different enough to cause confusion down the line when getting the character. 6) My new subroutine get_parent_cast_region takes any svalues as an argument and does its own checks for svalue_kind correctness, returning NULL otherwise. My reason for this is so get_binding_recursive would remain absolutely clean, the way its recursive call is. Basically, get_binding_recursive can just call get_parent_cast_region as an assignment within a conditional, just like the recursion step is, and the returned NULLs, whether by incorrect type or by not finding a binding for the correct type, causes it to bypass returning there. 7) The if guards were updated as requested. 8) I used your get_field_at_bit_offset routine as requested. To stress again, this was a static function to region.cc. I had to add a forward declaration (I again used analyzer.h because it is a "generic helper" function, but that is easy for you to change if you do not like it there) *and* I had to update your definition to extern rather than static. Otherwise, it seems to work. 9) The compound conditionals should all be to your liking now. 10) After all the back and forth, I think I am using "approved" helper functions and macros rather than accessing the tree fields directly. It seems they all work correctly. 11) All preexisting deja-gnu tests are on the original lines they were on before. All new tests are appended. Blank lines are left where the dg-bogus calls were. Thank you, Briancommit 7f0e3685900b527b64b81c3b44af36fd9e99bea3 Author: Brian Sobulefsky Date: Tue Feb 9 16:25:43 2021 -0800 Address the bug found in test file gcc/testsuite/gcc.dg/analyzer/casts-1.c, as recorded by the XFAIL, and some simpler and more complex versions found during the investigation of it. PR analyzer/98797 was opened for these bugs. The simplest manifestation of this bug can be replicated with: char arr[] = {'A', 'B', 'C', 'D'}; char *pt = ar; __analyzer_eval(*(pt + 0) == 'A'); __analyzer_eval(*(pt + 1) == 'B'); //etc The result of the eval call is "UNKNOWN" because the analyzer is unable to determine the value at the pointer. Note that array access (such as arr[0]) is correctly handled. The code responsible for this is in file region-model-manager.cc, function region_model_manager::maybe_fold_sub_svalue. The relevant section is commented /* Handle getting individual chars from STRING_CST */. This section only had a case for an element_region. A case needed to be added for an offset_region. Additionally, when the offset was 0, such as in *pt or *(pt + 0), the function region_model_manager::get_offset_region was failing to make the needed offset region at all. This was due to the test for a constant 0 pointer that was then returning get_cast_region. A special case is needed for when PARENT is of type array_type whose type matches TYPE. In this case, get_offset_region is allowed to continue to normal conclusion. The original bug noted in gcc/testsuite/gcc.dg/analyzer/casts-1.c was for the case: struct s1 { char a; char b; char c; char d; }; struct s2 { char arr[4]; }; struct s2 x = {{'A', 'B', 'C', 'D'}}; struct s1 *p = (struct s1 *)&x; __analyzer_eval (p->a == 'A'); //etc
PR analyzer/94362 Partial Fix
Hi, Please find a patch to fix part of the bug PR analyzer/94362. This bug is a false positive for a null dereference found when compiling openssl. The cause is the constraint_manager not knowing that i >= 0 within the for block: for ( ; i-- > 0; ) The bug can be further reduced to the constraint manager not knowing that i >= 0 within the if block: if (i-- > 0) which is not replicated for other operators, such as prefix decrement. The cause is that the constraint is applied to the initial_svalue of i, while it is a binop_svalue of i that enters the block (with op PLUS and arg1 -1). The constraint_manager does not have any constraints for this svalue and has no handler. A handler has been added that essentially recurs on the remaining arg if the other arg and other side of the condition are both constants and the op is PLUS_EXPR or MINUS_EXPR. This in essence fixed the problem, except an off by one error had been hiding in range::eval_condition. This error is hard to notice, because, for example, the code if(idx > 0) __analyzer_eval(idx >= 1); will compile as (check -fdump-ipa-analyzer to see) void test (int idx) { _Bool _1; int _2; : if (idx_4(D) > 0) goto ; [INV] else goto ; [INV] : _1 = idx_4(D) > 0; _2 = (int) _1; __analyzer_eval (_2); : return; } and will print "TRUE" to the screen, but as you can see, it is for the wrong reason, because we are not really testing the condition we wanted to test. You can force the failure (i.e. "UNKNOWN") for yourself with the following: void test(int idx) { int check = 1; if(idx > 0) __analyzer_eval(idx >= check); } which the compiler will not "fix" on us. An examination of range::eval_condition should convince you that there is an off by one error. Incidentally, I might recommend doing away with "open intervals of integers" entirely. When running the initial bug (the for loop), you will find that the analyzer prints "UNKNOWN" twice for the postfix operator, and "TRUE" "UNKNOWN" for other operators. This patch reduces postfix to the same state as the other operators. The second "UNKNOWN" seems to be due to a second "iterated" pass through the loop with a widening_svalue. A full fix of the bug will need a handler for the widening svalue, and much more critically, a correct merge of the constraints at the loop entrance. That, unfortunately, looks to be a hard problem. This patch fixes a few xfails as noted in the commit message. These were tests that were evidently devised to test whether the analyzer would understand arithmetic being done on constrained values. Addition and subtraction is now working as expected, a handler for multiplication and division can be added. As was noted in those test files, consideration at some point should be given to overflow. Thank you, Brian Sent with ProtonMail Secure Email.commit d4052e8c273ca267f6dcf782084d60acfc50a609 Author: Brian Sobulefsky Date: Sat Feb 27 00:36:40 2021 -0800 Changes to support eventual solution to bug PR analyzer/94362. This bug originated with a false positive null dereference during compilation of openssl. The bug is in effect caused by an error in constraint handling, specifically that within the for block: for ( ; i-- > 0; ) { } the constraint_manager should know i >= 0 but does not. A reduced form of this bug was found where the constraint manager did not know within the if block: if (i-- > 0) { } that i >= 0. This latter error was only present for the postfix operators, and not for other forms, like --i > 0. It was due to the constraint being set for the initial_svalue associated with i, but a binop_svalue being what entered the if block for which no constraint rules existed. By adding handling logic for a binop_svalue that adds or subtracts a constant, this problem was solved. This logic was added to a new method, constraint_manager::maybe_fold_condition, with the intent of eventually adding more cases there (unary_svalue and widening_svalue for example). Additionally, an off by one error was found in range::eval_condition that needed to be corrected to get the expected result. Correction of this error was done in that subroutine, resulting in no more calls to below_lower_bound and above_upper_bound. As such, these functions were commented out and may be removed if not needed for anything else. This change does not entirely fix the initial bug pr94362, but it reduces the postfix operator to the same state as other operators. In the case of the for loop, there appears to be an "initial pass" through the loop, which the analyzer will now understand for postfix, and then an "iterated pass" with a widening_svalue that the analyzer does not understand for any condition found. This seems to be due to the merging of constraints and is under investigation. While
Re: PR analyzer/94362 Partial Fix
Hi, It may not be worth altering at this point, but it seems like it would leave less bugs open if all the constraints go in as "closed" ranges and all evals are translated to closed intervals. So, if (idx > 0) and if (idx >= 1) are the same constraint. I know this won't be an option for eventual float support, but that is a different can of worms. For integers, you can fix it at those entry points and then all other subroutines can ignore the issue of open or closed ranges. I fully understand the eye glaze and did not want to have to write it that way. I am thinking if there is a cleaner way to do it. Anyway, that is why I put a comment in each case to derive the result. This issue of "sides of the condition" and "inverted operator" as you call it in some places is a recurring theme. It is especially irritating when we lose commutativity, as we do with MINUS. Adding logic in my subroutine for MULT or DIV is not hard, handling overflow is a bit more involved. At the very least, we would need to know what the max or min of a particular variable is, which might be in the type tree. We would also need to define how we want to handle the issue. The problem is (and I have been thinking about this a lot in terms of constraint merging), there are currently no "or constraints," which would be helpful in merging too. So for overflow, when you have something like if (idx > 0) { idx += num; __analyzer_eval(idx > num); } you have gone from a single constraint (idx > 0), to an "or condition" (idx > num || idx < MIN_INT + num). The only solution now, other than ignoring overflow as a bug that is tolerated, is to just pass it off as unhandled (and therefore UNKNOWN). Perhaps you may want to add overflow as one of your analyzer warnings if it cannot be ruled out? I did try to run a test with a simple || in the condition before just to see what would happen, and as you probably know it is not handled at all. I did not watch in gdb, but it is obvious from constraint-manager.cc that there is nothing to handle it. I think I actually did an __analyzer_eval() of the same || condition verbatim that was in the if() conditional and still got an UNKNOWN. It is a pretty intrusive change to add logic for that, which is why I have not done any work on it yet. Without the concept of "or" I don't see how we could handle overflow, but maybe you don't really want to handle it anyway, but rather just emit a warning if it might be considered poor practice to rely on something that is technically machine dependent anyway.
Re: PR analyzer/94362 Partial Fix
I have been kicking these sorts of ideas around ever since I came to understand that the second "UNKNOWN" in the for loop that originally started this was due to the state merge as we loop. For now, and I don't mean this disrespectfully because it is very hard to get right, the whole issue of merging has basically been punted, given some of the simple examples we found that will merge as an unknown svalue. As you think about this issue, "scope creep" becomes a concern quickly. It quickly turns into a halting problem of sorts, you have to decide how much of you want the analyzer to be able to "understand" a program. For example, any human can follow this: sum = 0; for (idx = 1; idx <= 10; idx++) sum += idx; __analyzer_eval (sum == 55); but from an analyzer perspective it opens up all sorts of questions and becomes a bit of a PhD thesis as to where you draw the line. The biggest concern with the analyzer seems to be vulnerabilities, so I doubt it is useful to get the analyzer to produce the correct answer for the above code, although it might be interesting to do so from an academic perspective. The example you provided gives a concrete reason that overflow should not be a complete "punt" and I like it. In the interest of fighting scope creep and keeping things manageable, I would question whether you want to actually track the overflow / no overflow cases separately or just raise any possible overflow as an error immediately. I am not disputing your idea, I would prefer to track the overflow and get a correct result (in this case, an under allocation of memory). I guess I would want to know how much work you think that will be. You still know the codebase a lot better than I do. My devil's advocate position would be if the analyzer raises exception on any possible overflow, will that overwhelm the user with false positives? I am not sure of the answer here, because a piece of me feels that overflow is not something that production code should be relying on in any serious application, and so should be non existent, but I am not sure if that is reflective of reality. The simplest way to handle your example would be the following: struct foo * make_arr (size_t n) { if (MAX_INT / sizeof (struct foo) >= n) return NULL; //continue with what you wrote } This should add a constraint that downstream of the initial guard, n is small enough to prevent overflow (I would have to check, but the current analyzer should be close to doing this if not already correct). Therefore, all we would need would be the check at the definition of sz as to whether it overflows, and that check should come back negative in my example, unknown in yours. Assuming we agree that the purpose of the analyzer is to prevent vulnerabilities, and not to provide an academic exercise in seeing how close we get to solving the halting problem, we could just treat any possible overflow as an error. As this was fundamentally your project, I don't want to tell you how to do it, and the nerd in me wants to build an analyzer that can answer all the silly puzzle code I can think to feed it, but from a utilitarian perspective and given everyone's limited time, I thought I would offer this as a path you can try.
Re: PR analyzer/94362 Partial Fix
Wow! I wasn't expecting that to work. Obviously we know that there is currently no handler for binop_svalue in the constraints so I would have to watch it run with state merging disabled to see how it is managing the unroll. The fact that merging breaks it is indicative of what we are saying though, that the constraint and merge model is currently insufficient. Hash algorithms may provide a counterexample for legitimate use of overflow. Anyway, I would prefer tracking the split too, but it is a big change. One way is state split, like you say, but that is a pretty invasive change to the way the graph works. The other way is to handle "or constraints" as I have said. This is an invasive change to the constraint model, but arguably the concept of "or" cannot be ignored forever. Thinking about it, I guess currently the concept of "and" is handled by the constraints (all constraints in a model exist as a big "and") and the concept of "or" is handled by the graph. This could be acceptable but we cannot split the graph arbitrarily, so there is currently no way to handle even a basic if (i == 1 || i == 10) what should be a very simple conditional. Handling a hash algorithm, like you say, is good to keep in mind, because we don't want an explosion of possibilities. If done right, the analyzer should understand for hashing that anything + anything => anything, and we see no explosion of state. By "raise an exception" I did mean issue an analyzer warning, yes. Perhaps the simple answer is to just track the svalue as a possible overflow in the state machine and report the warning for certain uses, like the alloc family, as you said. Regardless, proper overflow handling renders my naive binop handler unusable, because all it does is fold the condition and recur. There is basically no logic to it, and once you reenter eval_condition it is not possible to know how you got there.
Re: PR analyzer/94362 Partial Fix
Agreed too. Generic "error on overflow" is not an answer, and ignoring overflow is not an answer either because flagging faulty memory allocations is an important feature. Brian Sent with ProtonMail Secure Email. ‐‐‐ Original Message ‐‐‐ On Tuesday, March 2, 2021 6:09 PM, Jeff Law wrote: > On 3/2/21 6:40 PM, David Malcolm via Gcc-patches wrote: > > > > My devil's advocate position would be if the analyzer raises > > > exception on > > > any possible overflow, will that overwhelm the user with false > > > positives? > > > Presumably by "raise exception" you mean "issue a diagnostic and stop > > > analyzing this path", right? > > > > I think the point is to detect where numerical overflow can lead to > > e.g. a buffer overflow, rather than complain about numerical overflow > > in its own right, like in the make_arr example I gave earlier. > > WRT overflow, IMHO, the most valuable case to detect overflows is when > they feed an allocation via malloc/alloca. If an attacker can arrange > to overflow the size computation, then they can cause an > under-allocation which in turn opens the ability to over-write the stack > or heap data structures, which in turn are great attack vectors. > > And in case you think that's contrived, it isn't :-) > > http://phrack.org/issues/67/9.html > > > > I > > > am not sure of the answer here, because a piece of me feels that > > > overflow is not > > > something that production code should be relying on in any serious > > > application, > > > and so should be non existent, but I am not sure if that is > > > reflective of > > > reality. > > > My belief is that production code is full of overflows, but only some > > > of them are security-sensitive. Consider e.g. hashing algorithms that > > > sum some values and beningly assume overflow for wraparound as opposed > > > to the "calculate the size of the buffer to be allocated" example > > > (where the overflow is a classic security pitfall). > > Agreed. > > Jeff