NoQ created this revision.

I've seen a few false positives that appear because we construct C++11 
`std::initializer_list` objects with brace initializers, and such construction 
is not properly modeled. For instance, if a new object is constructed on the 
heap only to be put into a brace-initialized STL container, the object is 
reported to be leaked.

Approach (0): This can be trivially fixed by this patch, which causes pointers 
passed into initializer list expressions to immediately escape.

This fix is overly conservative though. So i did a bit of investigation as to 
how model `std::initializer_list` better.

According to the standard, `std::initializer_list<T>` is an object that has 
methods `begin()`, `end()`, and `size()`, where `begin()` returns a pointer to 
continous array of `size()` objects of type `T`, and `end()` is equal to 
`begin()` plus `size()`. The standard does hint that it should be possible to 
implement `std::initializer_list<T>` as a pair of pointers, or as a pointer and 
a size integer, however specific fields that the object would contain are an 
implementation detail.

Ideally, we should be able to model the initializer list's methods precisely. 
Or, at least, it should be possible to explain to the analyzer that the list 
somehow "takes hold" of the values put into it. Initializer lists can also be 
copied, which is a separate story that i'm not trying to address here.

The obvious approach to modeling `std::initializer_list` in a //checker// would 
be to construct a `SymbolMetadata` for the memory region of the initializer 
list object, which would be of type `T*` and represent `begin()`, so we'd 
trivially model `begin()` as a function that returns this symbol. The array 
pointed to by that symbol would be `bindLoc()`ed to contain the list's contents 
(probably as a `CompoundVal` to produce less bindings in the store). Extent of 
this array would represent `size()` and would be equal to the length of the 
list as written.

So this sounds good, however apparently it does nothing to address our false 
positives: when the list escapes, our `RegionStoreManager` is not magically 
guessing that the metadata symbol attached to it, together with its contents, 
should also escape. In fact, it's impossible to trigger a pointer escape from 
within the checker.

Approach (1): If only we enabled `ProgramState::bindLoc(..., 
notifyChanges=true)` to cause pointer escapes (not only region changes) (which 
sounds like the right thing to do anyway) such checker would be able to solve 
the false positives by triggering escapes when binding list elements to the 
list. However, it'd be as conservative as the current patch's solution. 
Ideally, we do not want escapes to happen so early. Instead, we'd prefer them 
to be delayed until the list itself escapes.

So i believe that escaping metadata symbols whenever their base regions escape 
would be the right thing to do. Currently we didn't think about that because we 
had neither pointer-type metadatas nor non-pointer escapes.

Approach (2): We could teach the Store to scan itself for bindings to 
metadata-symbolic-based regions during `scanReachableSymbols()` whenever a 
region turns out to be reachable. This requires no work on checker side, but it 
sounds performance-heavy.

Approach (3): We could let checkers maintain the set of active metadata symbols 
in the program state (ideally somewhere in the Store, which sounds weird but 
causes the smallest amount of layering violations), so that the core knew what 
to escape. This puts a stress on the checkers, but with a smart data map it 
wouldn't be a problem.

Approach (4): We could allow checkers to trigger pointer escapes in arbitrary 
moments. If we allow doing this within `checkPointerEscape` callback itself, we 
would be able to express facts like "when this region escapes, that metadata 
symbol attached to it should also escape". This sounds like an ultimate 
freedom, with maximum stress on the checkers - still not too much stress when 
we have smart data maps.

Does anybody like/hate any of these solutions or have anything better in mind? 
I'm personally liking the solution with `scanReachableSymbols()` - it should be 
possible to avoid performance overhead, and clarity seems nice.


https://reviews.llvm.org/D35216

Files:
  lib/StaticAnalyzer/Core/ExprEngine.cpp
  test/Analysis/initializer.cpp

Index: test/Analysis/initializer.cpp
===================================================================
--- test/Analysis/initializer.cpp
+++ test/Analysis/initializer.cpp
@@ -1,7 +1,9 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,unix.Malloc,debug.ExprInspection -analyzer-config c++-inlining=constructors -std=c++11 -verify %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,unix.Malloc,cplusplus.NewDeleteLeaks,debug.ExprInspection -analyzer-config c++-inlining=constructors -std=c++11 -verify %s
 
 void clang_analyzer_eval(bool);
 
+#include "Inputs/system-header-simulator-cxx.h"
+
 class A {
   int x;
 public:
@@ -204,3 +206,13 @@
    const char(&f)[2];
 };
 }
+
+namespace CXX_initializer_lists {
+struct C {
+  C(std::initializer_list<int *> list);
+};
+void foo() {
+  int *x = new int;
+  C c{x}; // no-warning
+}
+}
Index: lib/StaticAnalyzer/Core/ExprEngine.cpp
===================================================================
--- lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -802,6 +802,21 @@
   }
 }
 
+namespace {
+class CollectReachableSymbolsCallback final : public SymbolVisitor {
+  InvalidatedSymbols Symbols;
+
+public:
+  CollectReachableSymbolsCallback(ProgramStateRef State) {}
+  const InvalidatedSymbols &getSymbols() const { return Symbols; }
+
+  bool VisitSymbol(SymbolRef Sym) override {
+    Symbols.insert(Sym);
+    return true;
+  }
+};
+} // end anonymous namespace
+
 void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
                        ExplodedNodeSet &DstTop) {
   PrettyStackTraceLoc CrashInfo(getContext().getSourceManager(),
@@ -1078,8 +1093,24 @@
         SVal result = svalBuilder.conjureSymbolVal(nullptr, Ex, LCtx,
                                                    resultType,
                                                    currBldrCtx->blockCount());
-        ProgramStateRef state = N->getState()->BindExpr(Ex, LCtx, result);
-        Bldr2.generateNode(S, N, state);
+        ProgramStateRef State = N->getState()->BindExpr(Ex, LCtx, result);
+
+        // FIXME: Remove this first check when we begin modelling these
+        // expression classes separately.
+        if (!isa<ObjCBoxedExpr>(Ex))
+          for (auto Child : Ex->children()) {
+            if (!Child)
+              continue;
+            SVal Val = State->getSVal(Child, LCtx);
+            CollectReachableSymbolsCallback Scanner =
+                State->scanReachableSymbols<CollectReachableSymbolsCallback>(
+                    Val);
+            const InvalidatedSymbols &EscapedSymbols = Scanner.getSymbols();
+            State = getCheckerManager().runCheckersForPointerEscape(
+                State, EscapedSymbols,
+                /*CallEvent*/ nullptr, PSK_EscapeOther, nullptr);
+          }
+        Bldr2.generateNode(S, N, State);
       }
 
       getCheckerManager().runCheckersForPostStmt(Dst, Tmp, S, *this);
@@ -2193,21 +2224,6 @@
   getCheckerManager().runCheckersForPostStmt(Dst, AfterInvalidateSet, AE, *this);
 }
 
-namespace {
-class CollectReachableSymbolsCallback final : public SymbolVisitor {
-  InvalidatedSymbols Symbols;
-
-public:
-  CollectReachableSymbolsCallback(ProgramStateRef State) {}
-  const InvalidatedSymbols &getSymbols() const { return Symbols; }
-
-  bool VisitSymbol(SymbolRef Sym) override {
-    Symbols.insert(Sym);
-    return true;
-  }
-};
-} // end anonymous namespace
-
 // A value escapes in three possible cases:
 // (1) We are binding to something that is not a memory region.
 // (2) We are binding to a MemrRegion that does not have stack storage.
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to