[PATCH] D106823: [analyzer][solver] Iterate to a fixpoint during symbol simplification with constants

Gabor Marton via Phabricator via cfe-commits Mon, 26 Jul 2021 13:59:06 -0700

martong created this revision.
martong added reviewers: vsavchenko, NoQ, steakhal.
Herald added subscribers: manas, ASDenysPetrov, gamesh411, dkrupp, donat.nagy, 
Szelethus, mikhail.ramalho, a.sidorin, rnkovacs, szepet, baloghadamsoftware, 
xazax.hun, whisperity.
Herald added a reviewer: Szelethus.
martong requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.


D103314 <https://reviews.llvm.org/D103314> introduced symbol simplification 
when a new constant constraint is
added. Currently, we simplify existing equivalence classes by iterating over
all existing members of them and trying to simplify each member symbol with
simplifySVal.

At the end of such a simplification round we may end up introducing a
new constant constraint. Example:

  if (a + b + c != d)
    return;
  if (c + b != 0)
    return;
  // Simplification starts here.
  if (b != 0)
    return;

The `c == 0` constraint is the result of the first simplification iteration.
However, we could do another round of simplification to reach the conclusion
that `a == d`. Generally, we could do as many new iterations until we reach a
fixpoint.

Why should we do this? By reaching a fixpoint in simplification we are capable
of discovering infeasible states at the moment of the introduction of the
**first** constant constraint.
Let's modify the previous example just a bit, and consider what happens without
the fixpoint iteration.

  if (a + b + c != d)
    return;
  if (c + b != 0)
    return;
  // Adding a new constraint.
  if (a == d)
    return;
  // This brings in a contradiction.
  if (b != 0)
    return;
  clang_analyzer_warnIfReached(); // This produces a warning.
              // The path is already infeasible...
  if (c == 0) // ...but we realize that only when we evaluate `c == 0`.
    return;

What happens currently, without the fixpoint iteration? As the inline comments
suggest, without the fixpoint iteration we are doomed to realize that we are on
an infeasible path only after we are already walking on that. With fixpoint
iteration we can detect that before stepping on that. With fixpoint iteration,
the `clang_analyzer_warnIfReached` does not warn in the above example b/c
during the evaluation of `b == 0` we realize the contradiction. The engine and
the checkers do rely on that either `assume(Cond)` or `assume(!Cond)` should be
feasible. This is in fact assured by the so called expensive checks
(LLVM_ENABLE_EXPENSIVE_CHECKS). The StdLibraryFuncionsChecker is notably one of
the checkers that has a very similar assertion.

Consequently, either we do a fixpoint iteration or we should dump the whole
idea of the symbol simplification.

Before this patch, we simply added the simplified symbol to the equivalence
class. In this patch, after we have added the simplified symbol, we remove the
old (more complex) symbol from the members of the equivalence class
(`ClassMembers`). Removing the old symbol is beneficial because during the next
iteration of the simplification we don't have to consider again the old symbol.

Contrary to how we handle `ClassMembers`, we don't remove the old Sym->Class
relation from the `ClassMap`. This is important for two reasons: The
constraints of the old symbol can still be found via it's equivalence class
that it used to be the member of (1). We can spare one removal and thus one
additional tree in the forest of `ClassMap` (2).

Performance and complexity: Let us assume that in a State we have N non-trivial
equivalence classes and that all constraints and disequality info is related to
non-trivial classes. In the worst case, we can simplify only one symbol of one
class in each iteration. The number of symbols in one class cannot grow b/c we
replace the old symbol with the simplified one. Also, the number of the
equivalence classes can decrease only, b/c the algorithm does a merge operation
optionally. We need N iterations in this case to reach the fixpoint. Thus, the
steps needed to be done in the worst case is proportional to `N*N`. Empirical
results (attached) show that there is some hardly noticeable run-time and peak
memory discrepancy compared to the baseline. In my opinion, these differences
could be the result of measurement error.
This worst case scenario can be extended to that cases when we have trivial
classes in the constraints and in the disequality map are transforming to such
a State where there are only non-trivial classes, b/c the algorithm does merge
operations. A merge operation on two trivial classes results in one non-trivial
class.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D106823

Files:
  clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
  clang/test/Analysis/expr-inspection-printState-eq-classes.c
  clang/test/Analysis/symbol-simplification-disequality-info.cpp
  
clang/test/Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp
  clang/test/Analysis/symbol-simplification-fixpoint-one-iteration.cpp
  clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp

Index: clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp
===================================================================
--- /dev/null
+++ clang/test/Analysis/symbol-simplification-fixpoint-two-iterations.cpp
@@ -0,0 +1,45 @@
+// RUN: %clang_analyze_cc1 %s \
+// RUN:   -analyzer-checker=core \
+// RUN:   -analyzer-checker=debug.ExprInspection \
+// RUN:   2>&1 | FileCheck %s
+
+// In this test we check whether the solver's symbol simplification mechanism
+// is capable of reaching a fixpoint. This should be done after TWO iterations.
+
+void clang_analyzer_printState();
+
+void test_contradiction(int a, int b, int c, int d) {
+  if (a + b + c != d)
+    return;
+  if (c + b != 0)
+    return;
+  clang_analyzer_printState();
+  // CHECK:      "constraints": [
+  // CHECK-NEXT:   { "symbol": "(((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)) != (reg_$3<int d>)", "range": "{ [0, 0] }" },
+  // CHECK-NEXT:   { "symbol": "(reg_$2<int c>) + (reg_$1<int b>)", "range": "{ [0, 0] }" }
+  // CHECK-NEXT: ],
+  // CHECK-NEXT: "equivalence_classes": [
+  // CHECK-NEXT:   [ "((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)", "reg_$3<int d>" ]
+  // CHECK-NEXT: ],
+  // CHECK-NEXT: "disequality_info": null,
+
+  // Simplification starts here.
+  if (b != 0)
+    return;
+  clang_analyzer_printState();
+  // CHECK:       "constraints": [
+  // CHECK-NEXT:    { "symbol": "(reg_$0<int a>) != (reg_$3<int d>)", "range": "{ [0, 0] }" },
+  // CHECK-NEXT:    { "symbol": "reg_$1<int b>", "range": "{ [0, 0] }" },
+  // CHECK-NEXT:    { "symbol": "reg_$2<int c>", "range": "{ [0, 0] }" }
+  // CHECK-NEXT:  ],
+  // CHECK-NEXT:  "equivalence_classes": [
+  // CHECK-NEXT:    [ "(reg_$0<int a>) != (reg_$3<int d>)" ],
+  // CHECK-NEXT:    [ "reg_$0<int a>", "reg_$3<int d>" ],
+  // CHECK-NEXT:    [ "reg_$2<int c>" ]
+  // CHECK-NEXT:  ],
+  // CHECK-NEXT:  "disequality_info": null,
+
+  // Keep the symbols and the constraints! alive.
+  (void)(a * b * c * d);
+  return;
+}
Index: clang/test/Analysis/symbol-simplification-fixpoint-one-iteration.cpp
===================================================================
--- /dev/null
+++ clang/test/Analysis/symbol-simplification-fixpoint-one-iteration.cpp
@@ -0,0 +1,40 @@
+// RUN: %clang_analyze_cc1 %s \
+// RUN:   -analyzer-checker=core \
+// RUN:   -analyzer-checker=debug.ExprInspection \
+// RUN:   2>&1 | FileCheck %s
+
+// In this test we check whether the solver's symbol simplification mechanism
+// is capable of reaching a fixpoint. This should be done after one iteration.
+
+void clang_analyzer_printState();
+
+void test(int a, int b, int c) {
+  if (a + b != c)
+    return;
+  clang_analyzer_printState();
+  // CHECK:      "constraints": [
+  // CHECK-NEXT:   { "symbol": "((reg_$0<int a>) + (reg_$1<int b>)) != (reg_$2<int c>)", "range": "{ [0, 0] }" }
+  // CHECK-NEXT: ],
+  // CHECK-NEXT: "equivalence_classes": [
+  // CHECK-NEXT:   [ "(reg_$0<int a>) + (reg_$1<int b>)", "reg_$2<int c>" ]
+  // CHECK-NEXT: ],
+  // CHECK-NEXT: "disequality_info": null,
+
+  // Simplification starts here.
+  if (b != 0)
+    return;
+  clang_analyzer_printState();
+  // CHECK:        "constraints": [
+  // CHECK-NEXT:     { "symbol": "(reg_$0<int a>) != (reg_$2<int c>)", "range": "{ [0, 0] }" },
+  // CHECK-NEXT:     { "symbol": "reg_$1<int b>", "range": "{ [0, 0] }" }
+  // CHECK-NEXT:   ],
+  // CHECK-NEXT:   "equivalence_classes": [
+  // CHECK-NEXT:     [ "(reg_$0<int a>) != (reg_$2<int c>)" ],
+  // CHECK-NEXT:     [ "reg_$0<int a>", "reg_$2<int c>" ]
+  // CHECK-NEXT:   ],
+  // CHECK-NEXT: "disequality_info": null,
+
+  // Keep the symbols and the constraints! alive.
+  (void)(a * b * c);
+  return;
+}
Index: clang/test/Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp
===================================================================
--- /dev/null
+++ clang/test/Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp
@@ -0,0 +1,47 @@
+// RUN: %clang_analyze_cc1 %s \
+// RUN:   -analyzer-checker=core \
+// RUN:   -analyzer-checker=debug.ExprInspection \
+// RUN:   -verify
+
+// In this test we check whether the solver's symbol simplification mechanism
+// is capable of reaching a fixpoint.
+
+void clang_analyzer_printState();
+void clang_analyzer_warnIfReached();
+
+void test_contradiction(int a, int b, int c, int d, int x) {
+  if (a + b + c != d)
+    return;
+  if (a == d)
+    return;
+  if (c + b != 0)
+    return;
+  clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
+  clang_analyzer_printState();
+
+  // Bring in the contradiction.
+  if (b != 0)
+    return;
+
+  // After the simplification of `b == 0` we have:
+  //   b == 0
+  //   a + c == d
+  //   a != d
+  //   c == 0
+  // Doing another iteration we reach the fixpoint (with a contradiction):
+  //   b == 0
+  //   a == d
+  //   a != d
+  //   c == 0
+  clang_analyzer_warnIfReached(); // no-warning, i.e. UNREACHABLE
+  clang_analyzer_printState();
+
+  // Enabling expensive checks would trigger an assertion failure here without
+  // the fixpoint iteration.
+  if (a + c == x)
+    return;
+
+  // Keep the symbols and the constraints! alive.
+  (void)(a * b * c * d * x);
+  return;
+}
Index: clang/test/Analysis/symbol-simplification-disequality-info.cpp
===================================================================
--- /dev/null
+++ clang/test/Analysis/symbol-simplification-disequality-info.cpp
@@ -0,0 +1,65 @@
+// RUN: %clang_analyze_cc1 %s \
+// RUN:   -analyzer-checker=core \
+// RUN:   -analyzer-checker=debug.ExprInspection \
+// RUN:   2>&1 | FileCheck %s
+
+// In this test we check how the solver's symbol simplification mechanism
+// simplifies the disequality info.
+
+void clang_analyzer_printState();
+
+void test_contradiction(int a, int b, int c, int d) {
+  if (a + b + c == d)
+    return;
+  clang_analyzer_printState();
+  // CHECK:       "disequality_info": [
+  // CHECK-NEXT:    {
+  // CHECK-NEXT:      "class": [ "((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)" ],
+  // CHECK-NEXT:      "disequal_to": [
+  // CHECK-NEXT:        [ "reg_$3<int d>" ]]
+  // CHECK-NEXT:    },
+  // CHECK-NEXT:    {
+  // CHECK-NEXT:      "class": [ "reg_$3<int d>" ],
+  // CHECK-NEXT:      "disequal_to": [
+  // CHECK-NEXT:        [ "((reg_$0<int a>) + (reg_$1<int b>)) + (reg_$2<int c>)" ]]
+  // CHECK-NEXT:    }
+  // CHECK-NEXT:  ],
+
+
+  // Simplification starts here.
+  if (b != 0)
+    return;
+  clang_analyzer_printState();
+  // CHECK:      "disequality_info": [
+  // CHECK-NEXT:   {
+  // CHECK-NEXT:     "class": [ "(reg_$0<int a>) + (reg_$2<int c>)" ],
+  // CHECK-NEXT:     "disequal_to": [
+  // CHECK-NEXT:       [ "reg_$3<int d>" ]]
+  // CHECK-NEXT:   },
+  // CHECK-NEXT:   {
+  // CHECK-NEXT:     "class": [ "reg_$3<int d>" ],
+  // CHECK-NEXT:     "disequal_to": [
+  // CHECK-NEXT:        [ "(reg_$0<int a>) + (reg_$2<int c>)" ]]
+  // CHECK-NEXT:    }
+  // CHECK-NEXT:  ],
+
+  if (c != 0)
+    return;
+  clang_analyzer_printState();
+  // CHECK:       "disequality_info": [
+  // CHECK-NEXT:    {
+  // CHECK-NEXT:      "class": [ "reg_$0<int a>" ],
+  // CHECK-NEXT:      "disequal_to": [
+  // CHECK-NEXT:        [ "reg_$3<int d>" ]]
+  // CHECK-NEXT:    },
+  // CHECK-NEXT:    {
+  // CHECK-NEXT:      "class": [ "reg_$3<int d>" ],
+  // CHECK-NEXT:      "disequal_to": [
+  // CHECK-NEXT:        [ "reg_$0<int a>" ]]
+  // CHECK-NEXT:    }
+  // CHECK-NEXT:  ],
+
+  // Keep the symbols and the constraints! alive.
+  (void)(a * b * c * d);
+  return;
+}
Index: clang/test/Analysis/expr-inspection-printState-eq-classes.c
===================================================================
--- clang/test/Analysis/expr-inspection-printState-eq-classes.c
+++ clang/test/Analysis/expr-inspection-printState-eq-classes.c
@@ -8,14 +8,13 @@
     return;
   if (a != d)
     return;
-  if (b != 0)
+  if (a != c)
     return;
   clang_analyzer_printState();
   (void)(a * b * c * d);
   return;
 }
 
-  // CHECK:      "equivalence_classes": [
-  // CHECK-NEXT:   [ "((reg_$0<int a>) + (reg_$1<int b>)) != (reg_$2<int c>)", "(reg_$0<int a>) != (reg_$2<int c>)" ],
-  // CHECK-NEXT:   [ "(reg_$0<int a>) + (reg_$1<int b>)", "reg_$0<int a>", "reg_$2<int c>", "reg_$3<int d>" ]
-  // CHECK-NEXT: ],
+ // CHECK:       "equivalence_classes": [
+ // CHECK-NEXT:    [ "(reg_$0<int a>) + (reg_$1<int b>)", "reg_$0<int a>", "reg_$2<int c>", "reg_$3<int d>" ]
+ // CHECK-NEXT:  ],
Index: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
===================================================================
--- clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -605,6 +605,10 @@
                                                         ProgramStateRef State,
                                                         EquivalenceClass Class);
 
+  /// Remove one member from the class.
+  LLVM_NODISCARD ProgramStateRef removeMember(ProgramStateRef State,
+                                              const SymbolRef Old);
+
   void dumpToStream(ProgramStateRef State, raw_ostream &os) const;
   LLVM_DUMP_METHOD void dump(ProgramStateRef State) const {
     dumpToStream(State, llvm::errs());
@@ -1692,29 +1696,43 @@
 
 bool ConstraintAssignor::assignSymExprToConst(const SymExpr *Sym,
                                               const llvm::APSInt &Constraint) {
-  llvm::SmallSet<EquivalenceClass, 4> SimplifiedClasses;
-  // Iterate over all equivalence classes and try to simplify them.
-  ClassMembersTy Members = State->get<ClassMembers>();
-  for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {
-    EquivalenceClass Class = ClassToSymbolSet.first;
-    State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);
-    if (!State)
-      return false;
-    SimplifiedClasses.insert(Class);
-  }
+  ProgramStateRef OldState;
+  do {
+    OldState = State;
+
+    // Iterate over all equivalence classes and try to simplify them.
+    ClassMembersTy Members = State->get<ClassMembers>();
+    for (std::pair<EquivalenceClass, SymbolSet> ClassToSymbolSet : Members) {
+      EquivalenceClass Class = ClassToSymbolSet.first;
+      State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);
+      if (!State)
+        return false;
+    }
 
-  // Trivial equivalence classes (those that have only one symbol member) are
-  // not stored in the State. Thus, we must skim through the constraints as
-  // well. And we try to simplify symbols in the constraints.
-  ConstraintRangeTy Constraints = State->get<ConstraintRange>();
-  for (std::pair<EquivalenceClass, RangeSet> ClassConstraint : Constraints) {
-    EquivalenceClass Class = ClassConstraint.first;
-    if (SimplifiedClasses.count(Class)) // Already simplified.
-      continue;
-    State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);
-    if (!State)
-      return false;
-  }
+    // Trivial equivalence classes (those that have only one symbol member) are
+    // not stored in the State. Thus, we must skim through the constraints as
+    // well. And we try to simplify symbols in the constraints.
+    ConstraintRangeTy Constraints = State->get<ConstraintRange>();
+    for (std::pair<EquivalenceClass, RangeSet> ClassConstraint : Constraints) {
+      EquivalenceClass Class = ClassConstraint.first;
+      State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);
+      if (!State)
+        return false;
+    }
+
+    // We may have trivial equivalence classes in the disequality info as
+    // well, and we need to simplify them.
+    DisequalityMapTy DisequalityInfo = State->get<DisequalityMap>();
+    for (std::pair<EquivalenceClass, ClassSet> DisequalityEntry :
+         DisequalityInfo) {
+      EquivalenceClass Class = DisequalityEntry.first;
+      ClassSet DisequalClasses = DisequalityEntry.second;
+      State = EquivalenceClass::simplify(Builder, RangeFactory, State, Class);
+      if (!State)
+        return false;
+    }
+
+  } while (State != OldState);
 
   return true;
 }
@@ -2086,6 +2104,32 @@
   return llvm::None;
 }
 
+LLVM_NODISCARD ProgramStateRef
+EquivalenceClass::removeMember(ProgramStateRef State, const SymbolRef Old) {
+
+  SymbolSet ClsMembers = getClassMembers(State);
+  assert(ClsMembers.contains(Old));
+  assert(ClsMembers.getHeight() > 1 &&
+         "Class should have at least two members");
+
+  // We don't remove `Old`'s Sym->Class relation for two reasons:
+  // 1) This way constraints for the old symbol can still be found via it's
+  // equivalence class that it used to be the member of.
+  // 2) Performance and resource reasons. We can spare one removal and thus one
+  // additional tree in the forest of `ClassMap`.
+
+  // Remove `Old`'s Class->Sym relation.
+  SymbolSet::Factory &F = getMembersFactory(State);
+  ClassMembersTy::Factory &EMFactory = State->get_context<ClassMembers>();
+  ClsMembers = F.remove(ClsMembers, Old);
+  // Overwrite the existing members assigned to this class.
+  ClassMembersTy ClassMembersMap = State->get<ClassMembers>();
+  ClassMembersMap = EMFactory.add(ClassMembersMap, *this, ClsMembers);
+  State = State->set<ClassMembers>(ClassMembersMap);
+
+  return State;
+}
+
 // Iterate over all symbols and try to simplify them. Once a symbol is
 // simplified then we check if we can merge the simplified symbol's equivalence
 // class to this class. This way, we simplify not just the symbols but the
@@ -2101,9 +2145,16 @@
       // The simplified symbol should be the member of the original Class,
       // however, it might be in another existing class at the moment. We
       // have to merge these classes.
+      ProgramStateRef OldState = State;
       State = merge(F, State, MemberSym, SimplifiedMemberSym);
       if (!State)
         return nullptr;
+      // No state change, no merge happened actually.
+      if (OldState == State)
+        continue;
+      assert(find(State, MemberSym) == find(State, SimplifiedMemberSym));
+      // Remove the old and more complex symbol.
+      State = find(State, MemberSym).removeMember(State, MemberSym);
     }
   }
   return State;

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D106823: [analyzer][solver] Iterate to a fixpoint during symbol simplification with constants

Reply via email to