[PATCH] D69813: [analyzer] CERTStrChecker: Model gets()

Csaba Dabis via Phabricator via cfe-commits Wed, 13 Nov 2019 11:46:28 -0800

Charusso added inline comments.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:201-207
+void CERTStrChecker::evalGets(const CallEvent &Call, const CallContext &CallC,
+                              CheckerContext &C) const {
+  unsigned DestPos = *CallC.DestinationPos;
+  const Expr *DestArg = Call.getArgExpr(DestPos)->IgnoreImpCasts();
+  SVal DestV = Call.getArgSVal(DestPos);
+
+  auto Report = getReport(*BT, Call, CallC, C);
----------------
NoQ wrote:
> NoQ wrote:
> > All right, so basically what you're saying is that literally every 
> > invocation of `gets()` deserves a warning. This means that for all 
> > practical purposes your checker //is// an AST-based checker, just 
> > implemented with path-sensitive callbacks. A path-sensitive checker emits 
> > warnings based on multiple events that happen sequentially along the path 
> > (use-after-free: "memory deallocated - that same memory used", division by 
> > zero: "value constrained to zero - something is being divided by that same 
> > value", etc.) but your checker emits the warning by looking at only one 
> > statement: "`gets()` is invoked".
> > 
> > Do i understand correctly that your plan is to use path-sensitive analysis 
> > for fixits only? But you can't emit fixits for any truly path-sensitive 
> > warning anyway. Fixits must work correctly on all execution paths, so you 
> > cannot emit a correct fixit by looking at only one execution path. In order 
> > to emit fixits, you need to either resort to a pure AST check anyway ("this 
> > expression refers to an array of fixed size"), or maybe implement auxiliary 
> > data-flow analysis for a certain must-problem (eg., "the buffer argument 
> > may have exactly one possible value across all paths that reach `gets()`"). 
> > But in both cases the path-sensitive engine does literally nothing to help 
> > you; all the data that you'll need for your fixit will be available from 
> > the AST.
> Like, i think this was an interesting investigation and i was genuinely 
> curious about how this turns out to be, but for now it seems that the problem 
> you're trying to solve cannot be solved this way. Path sensitive analysis is 
> fundamentally applicable to only 50% of the problems (to "may-problems" but 
> not to "must-problems"), and the problem you're trying to solve is in the 
> latter category. I believe you'll have to fall back to the relatively boring 
> task of adding fixits to `security.insecureAPI.gets`; but then, again, if you 
> manage to employ use-def chains for this problem, that might be quite an 
> inspiring start.
Most of the time the given allocation to hold the arbitrary string happens in a 
local scope. After that we see `fscanf(dst)`, `gets(dst)`, `memcpy(dst)`, 
`strncpy(dst)`, stuff... which pushes new data into that memory block, and then 
the cool developers write that down: `dst[42] = '\0'` which means all the 
reports should be thrown away in a path-sensitive manner on `dst`. 
Reallocations, re-bindings, non-AST stuff could handled very easily with the 
non-AST checker, like that one.

Sometimes we are work with destination-array like `memcpy(Foo[Bar.Baz]->Qux, 
...)` which could not really handled with just a simple AST-based checker. I 
could not say at the moment we could handle it with symbols, but we have a much 
larger scope of information by symbols.

Most of the time because of the Analyzer is much smarter than the Tidy we could 
emit fix-its with the help of flow-sensitiveness very easily. I would create 
huge white-lists what we want to fix-it, and what we could not, but at some 
point if we model the symbols better, we can.

Other than that easy false-positive suppression and tiny flow-sensitive 
rebinding stuff, we could be sure what is going on by each string-manipulation. 
The `gets()` is a toy example where at most a `grep -rn 'gets('` could do 
better analysis than us.

The real world looks like that:
```
1       encryptedpasswordlen = ((strlen(passwd) + RADIUS_VECTOR_LENGTH - 1)
        / RADIUS_VECTOR_LENGTH) * RADIUS_VECTOR_LENGTH;
2       cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH);
3       memcpy(cryptvector, secret, strlen(secret));
...
4       for (i = 0; i < encryptedpasswordlen; i += RADIUS_VECTOR_LENGTH) {
5         memcpy(cryptvector + strlen(secret), md5trailer, 
RADIUS_VECTOR_LENGTH);
...
```
- from `postgresql/src/backend/libpq/auth.c`

At `3` we would emit a warning, because the null-termination left by the wrong 
size of the string, but at `5` we see that, it left, because at that offset the 
string continues, and dunno, on `6` when we model every flow-sensitive 
information, the string left non-terminated.

Of course each of that stuff is local and AST-based (with huge overhead of 
rebindings and impossible false-positive suppression), but when you have two of 
it, that is when the fun begins.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69813/new/

https://reviews.llvm.org/D69813



_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69813: [analyzer] CERTStrChecker: Model gets()

Reply via email to