[PATCH] D69813: [analyzer] CERTStrChecker: Model gets()

Csaba Dabis via Phabricator via cfe-commits Wed, 13 Nov 2019 12:22:55 -0800

Charusso added inline comments.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:201-207
+void CERTStrChecker::evalGets(const CallEvent &Call, const CallContext &CallC,
+                              CheckerContext &C) const {
+  unsigned DestPos = *CallC.DestinationPos;
+  const Expr *DestArg = Call.getArgExpr(DestPos)->IgnoreImpCasts();
+  SVal DestV = Call.getArgSVal(DestPos);
+
+  auto Report = getReport(*BT, Call, CallC, C);
----------------
Charusso wrote:
> NoQ wrote:
> > NoQ wrote:
> > > All right, so basically what you're saying is that literally every 
> > > invocation of `gets()` deserves a warning. This means that for all 
> > > practical purposes your checker //is// an AST-based checker, just 
> > > implemented with path-sensitive callbacks. A path-sensitive checker emits 
> > > warnings based on multiple events that happen sequentially along the path 
> > > (use-after-free: "memory deallocated - that same memory used", division 
> > > by zero: "value constrained to zero - something is being divided by that 
> > > same value", etc.) but your checker emits the warning by looking at only 
> > > one statement: "`gets()` is invoked".
> > > 
> > > Do i understand correctly that your plan is to use path-sensitive 
> > > analysis for fixits only? But you can't emit fixits for any truly 
> > > path-sensitive warning anyway. Fixits must work correctly on all 
> > > execution paths, so you cannot emit a correct fixit by looking at only 
> > > one execution path. In order to emit fixits, you need to either resort to 
> > > a pure AST check anyway ("this expression refers to an array of fixed 
> > > size"), or maybe implement auxiliary data-flow analysis for a certain 
> > > must-problem (eg., "the buffer argument may have exactly one possible 
> > > value across all paths that reach `gets()`"). But in both cases the 
> > > path-sensitive engine does literally nothing to help you; all the data 
> > > that you'll need for your fixit will be available from the AST.
> > Like, i think this was an interesting investigation and i was genuinely 
> > curious about how this turns out to be, but for now it seems that the 
> > problem you're trying to solve cannot be solved this way. Path sensitive 
> > analysis is fundamentally applicable to only 50% of the problems (to 
> > "may-problems" but not to "must-problems"), and the problem you're trying 
> > to solve is in the latter category. I believe you'll have to fall back to 
> > the relatively boring task of adding fixits to `security.insecureAPI.gets`; 
> > but then, again, if you manage to employ use-def chains for this problem, 
> > that might be quite an inspiring start.
> Most of the time the given allocation to hold the arbitrary string happens in 
> a local scope. After that we see `fscanf(dst)`, `gets(dst)`, `memcpy(dst)`, 
> `strncpy(dst)`, stuff... which pushes new data into that memory block, and 
> then the cool developers write that down: `dst[42] = '\0'` which means all 
> the reports should be thrown away in a path-sensitive manner on `dst`. 
> Reallocations, re-bindings, non-AST stuff could handled very easily with the 
> non-AST checker, like that one.
> 
> Sometimes we are work with destination-array like `memcpy(Foo[Bar.Baz]->Qux, 
> ...)` which could not really handled with just a simple AST-based checker. I 
> could not say at the moment we could handle it with symbols, but we have a 
> much larger scope of information by symbols.
> 
> Most of the time because of the Analyzer is much smarter than the Tidy we 
> could emit fix-its with the help of flow-sensitiveness very easily. I would 
> create huge white-lists what we want to fix-it, and what we could not, but at 
> some point if we model the symbols better, we can.
> 
> Other than that easy false-positive suppression and tiny flow-sensitive 
> rebinding stuff, we could be sure what is going on by each 
> string-manipulation. The `gets()` is a toy example where at most a `grep -rn 
> 'gets('` could do better analysis than us.
> 
> The real world looks like that:
> ```
> 1     encryptedpasswordlen = ((strlen(passwd) + RADIUS_VECTOR_LENGTH - 1)
>       / RADIUS_VECTOR_LENGTH) * RADIUS_VECTOR_LENGTH;
> 2     cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH);
> 3     memcpy(cryptvector, secret, strlen(secret));
> ...
> 4     for (i = 0; i < encryptedpasswordlen; i += RADIUS_VECTOR_LENGTH) {
> 5       memcpy(cryptvector + strlen(secret), md5trailer, 
> RADIUS_VECTOR_LENGTH);
> ...
> ```
> - from `postgresql/src/backend/libpq/auth.c`
> 
> At `3` we would emit a warning, because the null-termination left by the 
> wrong size of the string, but at `5` we see that, it left, because at that 
> offset the string continues, and dunno, on `6` when we model every 
> flow-sensitive information, the string left non-terminated.
> 
> Of course each of that stuff is local and AST-based (with huge overhead of 
> rebindings and impossible false-positive suppression), but when you have two 
> of it, that is when the fun begins.
> if you manage to employ use-def chains for this problem, that might be quite 
> an inspiring start.

We have regions so we do not need to rely on such chains in the AST-world, if I 
get your idea right by the "Use-define chain" wiki [1]. Btw. it is not that 
difficult problem in the AST-world, you need to create a recursive AST-matcher 
on the `DeclRefExpr` with `std::function`.

Basically, I want to implement all the STR rules in a logical order, here is 
one of the examples from STR32-C [2] which is my last planned project at the 
moment:

```
void lessen_memory_usage(void) {
  wchar_t *temp;
  size_t temp_size;
 
  /* ... */
 
  if (cur_msg != NULL) {
    temp_size = cur_msg_size / 2 + 1;
    temp = realloc(cur_msg, temp_size * sizeof(wchar_t));
    /* temp &and cur_msg may no longer be null-terminated */
    if (temp == NULL) {
      /* Handle error */
    }
 
    cur_msg = temp;
    cur_msg_size = temp_size;
    cur_msg_len = wcslen(cur_msg);
  }
}
```

They really want to represent the wild, and please think of that problem in 
terms of an AST-checker versus in terms of `getAsRegion` and 
`getDynamicElementCount` to compare the size of the allocated memory block and 
inject that: `cur_msg[temp_size - 1] = L'\0';` because the array would 
overflow. How cool is the Analyzer and how smart to do so.

It would took at most 10 minutes to implement if the `evalBinOp` would work or 
the main 10 years old implementation of obtaining the element-count would work. 
I am on the way to fixing the latter, but it will be more path-sensitive info, 
than you could imagine, like reusing the "zombie" size-expression turned out to 
be a hard problem. And it will be a lot easier to solve such problems, I 
believe.

The local scope is the key, and that checker at the moment only tries to 
rewrite destination-arrays which are local. I think we could see if we emit 
multiple reports on a given call so due to ambiguity we would drop such 
fix-its. With an AST checker I could not imagine how difficult it would be.

It is rather a research at the moment, because I have encountered dozens of 
silly stuff, beginning with the `getExtent()`, so I cannot say this direction 
is the 100% future, but I have picked the Analyzer over Tidy, for that reason, 
it is smarter.

[1] https://en.wikipedia.org/wiki/Use-define_chain
[2] 
https://wiki.sei.cmu.edu/confluence/display/c/STR32-C.+Do+not+pass+a+non-null-terminated+character+sequence+to+a+library+function+that+expects+a+string


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69813/new/

https://reviews.llvm.org/D69813



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69813: [analyzer] CERTStrChecker: Model gets()

Reply via email to