[PATCH] D35216: [analyzer] Escape symbols when creating std::initializer_list.

Artem Dergachev via Phabricator via cfe-commits Wed, 19 Jul 2017 05:57:41 -0700

NoQ added a comment.

In https://reviews.llvm.org/D35216#814124, @dcoughlin wrote:


> In this case, I would be fine with some sort of AbstractStorageMemoryRegion 
> that meant "here is a memory region and somewhere reachable from here exists 
> another region of type T". Or even multiple regions with different 
> identifiers. This wouldn't specify how the memory is reachable, but it would 
> allow for transfer functions to get at those regions and it would allow for 
> invalidation.


Yeah, this is what we can easily implement now as a 
symbolic-region-based-on-a-metadata-symbol (though we can make a new region 
class for that if we eg. want it typed). The problem is that the relation 
between such storage region and its parent object region is essentially 
immaterial, similarly to the relation between SymbolRegionValue and its parent 
region. Region contents are mutable: today the abstract storage is reachable 
from its parent object, tomorrow it's not, and maybe something else becomes 
reachable, something that isn't even abstract. So the parent region for the 
abstract storage is most of the time at best a "nice to know" thing - we cannot 
rely on it to do any actual work. We'd anyway need to rely on the checker to do 
the job.

> For std::initializer_list this reachable region would the region for the 
> backing array and the transfer functions for begin() and end() yield the 
> beginning and end element regions for it.

So maybe in fact for std::initializer_list it may work fine because you cannot 
change the data after the object is constructed - so this region's contents are 
essentially immutable. For the future, i feel as if it is a dead end.

I'd like to consider another funny example. Suppose we're trying to model 
std::unique_ptr. Consider:

  void bar(const std::unique_ptr<int> &x);
  
  void foo(std::unique_ptr<int> &x) {
    int *a = x.get();   // (a, 0, direct): &AbstractStorageRegion
    *a = 1;             // (AbstractStorageRegion, 0, direct): 1 S32b
    int *b = new int;
    *b = 2;             // (SymRegion{conj_$0<int *>}, 0 ,direct): 2 S32b
    x.reset(b);         // Checker map: x -> SymRegion{conj_$0<int *>}
    bar(x);             // 'a' doesn't escape (the pointer was unique), 'b' 
does.
    clang_analyzer_eval(*a == 1); // Making this true is up to the checker.
    clang_analyzer_eval(*b == 2); // Making this unknown is up to the checker.
  }

The checker doesn't totally need to ensure that `*a == 1` passes - even though 
the pointer was unique, it could theoretically have `.get()`-ed above and the 
code could of course break the uniqueness invariant (though we'd probably want 
it). The checker can say that "even if `*a` did escape, it was not because it 
was stuffed directly into `bar()`".

The checker's direct responsibility, however, is to solve the `*b == 2` thing 
(which is in fact the problem we're dealing with in this patch - escaping the 
storage region of the object).

So we're talking about one more operation over the program state (scanning 
reachable symbols and regions) that cannot work without checker support.

We can probably add a new callback "checkReachableSymbols" to solve this. This 
is in fact also related to the dead symbols problem (we're scanning for live 
symbols in the store and in the checkers separately, but we need to do so 
simultaneously with a single worklist). Hmm, in fact this sounds like a good 
idea; we can replace checkLiveSymbols with checkReachableSymbols.

Or we could just have ghost member variables, and no checker support required 
at all. For ghost member variables, the relation with their parent region 
(which would be their superregion) is actually useful, the mutability of their 
contents is expressed naturally, and the store automagically sees reachable 
symbols, live symbols, escapes, invalidations, whatever.

> In my view this differs from ghost variables in that (1) this storage does 
> actually exist (it is just a library implementation detail where that storage 
> lives) and (2) it is perfectly valid for a pointer into that storage to be 
> returned and for another part of the program to read or write from that 
> storage. (Well, in this case just read since it is allowed to be read-only 
> memory).
> 
> What I'm not OK with is modeling abstract analysis state (for example, the 
> count of a NSMutableArray or the typestate of a file handle) as a value 
> stored in some ginned up region in the store.This takes an easy problem that 
> the analyzer does well at (modeling typestate) and turns it into a hard one 
> that the analyzer is bad at (reasoning about the contents of the heap).

Yeah, i tend to agree on that. For simple typestates, this is probably an 
overkill, so let's definitely put aside the idea of "ghost symbolic regions" 
that i had earlier.

But, to summarize a bit, in our current case, however, the typestate we're 
looking for //**is**// the contents of the heap. And when we try to model such 
typestates (complex in this specific manner, i.e. heap-like) in any checker, we 
have a choice between re-doing this modeling in every such checker (which is 
something analyzer is indeed good at, but at a price of making checkers heavy) 
or instead relying on the Store to do exactly what it's designed to do.

> I think the key criterion here is: "is the region accessible from outside the 
> library". That is, does the library expose the region as a pointer that can 
> be read to or written from in the client program? If so, then it makes sense 
> for this to be in the store: we are modeling reachable storage as storage. 
> But if we're just modeling arbitrary analysis facts that need to be 
> invalidated when a pointer escapes then we shouldn't try to gin up storage 
> for them just to get invalidation for free.

As a metaphor, i'd probably compare it to body farms - the difference between 
ghost member variables and metadata symbols seems to me like the difference 
between body farms and evalCall. Both are nice to have, and body farms are very 
pleasant to work with, even if not omnipotent. I think it's fine for a 
`FunctionDecl`'s body in a body farm to have a local variable, even if such 
variable doesn't actually exist, even if it cannot be seen from outside the 
function call. I'm not seeing immediate practical difference between "it does 
actually exist" and "it doesn't actually exist, just a handy abstraction". 
Similarly, i think it's fine if we have a `CXXRecordDecl` with 
implementation-defined contents, and try to farm up a member variable as a 
handy abstraction (we don't even need to know its name or offset, only that 
it's there somewhere).


https://reviews.llvm.org/D35216



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D35216: [analyzer] Escape symbols when creating std::initializer_list.

Reply via email to