Charusso added inline comments.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:201-207
+void CERTStrChecker::evalGets(const CallEvent &Call, const CallContext &CallC,
+ CheckerContext &C) const {
+ unsigned DestPos = *CallC.DestinationPos;
+ const Expr *DestArg = Call.getArgExpr(DestPos)->IgnoreImpCasts();
+ SVal DestV = Call.getArgSVal(DestPos);
+
+ auto Report = getReport(*BT, Call, CallC, C);
----------------
Charusso wrote:
> NoQ wrote:
> > NoQ wrote:
> > > All right, so basically what you're saying is that literally every
> > > invocation of `gets()` deserves a warning. This means that for all
> > > practical purposes your checker //is// an AST-based checker, just
> > > implemented with path-sensitive callbacks. A path-sensitive checker emits
> > > warnings based on multiple events that happen sequentially along the path
> > > (use-after-free: "memory deallocated - that same memory used", division
> > > by zero: "value constrained to zero - something is being divided by that
> > > same value", etc.) but your checker emits the warning by looking at only
> > > one statement: "`gets()` is invoked".
> > >
> > > Do i understand correctly that your plan is to use path-sensitive
> > > analysis for fixits only? But you can't emit fixits for any truly
> > > path-sensitive warning anyway. Fixits must work correctly on all
> > > execution paths, so you cannot emit a correct fixit by looking at only
> > > one execution path. In order to emit fixits, you need to either resort to
> > > a pure AST check anyway ("this expression refers to an array of fixed
> > > size"), or maybe implement auxiliary data-flow analysis for a certain
> > > must-problem (eg., "the buffer argument may have exactly one possible
> > > value across all paths that reach `gets()`"). But in both cases the
> > > path-sensitive engine does literally nothing to help you; all the data
> > > that you'll need for your fixit will be available from the AST.
> > Like, i think this was an interesting investigation and i was genuinely
> > curious about how this turns out to be, but for now it seems that the
> > problem you're trying to solve cannot be solved this way. Path sensitive
> > analysis is fundamentally applicable to only 50% of the problems (to
> > "may-problems" but not to "must-problems"), and the problem you're trying
> > to solve is in the latter category. I believe you'll have to fall back to
> > the relatively boring task of adding fixits to `security.insecureAPI.gets`;
> > but then, again, if you manage to employ use-def chains for this problem,
> > that might be quite an inspiring start.
> Most of the time the given allocation to hold the arbitrary string happens in
> a local scope. After that we see `fscanf(dst)`, `gets(dst)`, `memcpy(dst)`,
> `strncpy(dst)`, stuff... which pushes new data into that memory block, and
> then the cool developers write that down: `dst[42] = '\0'` which means all
> the reports should be thrown away in a path-sensitive manner on `dst`.
> Reallocations, re-bindings, non-AST stuff could handled very easily with the
> non-AST checker, like that one.
>
> Sometimes we are work with destination-array like `memcpy(Foo[Bar.Baz]->Qux,
> ...)` which could not really handled with just a simple AST-based checker. I
> could not say at the moment we could handle it with symbols, but we have a
> much larger scope of information by symbols.
>
> Most of the time because of the Analyzer is much smarter than the Tidy we
> could emit fix-its with the help of flow-sensitiveness very easily. I would
> create huge white-lists what we want to fix-it, and what we could not, but at
> some point if we model the symbols better, we can.
>
> Other than that easy false-positive suppression and tiny flow-sensitive
> rebinding stuff, we could be sure what is going on by each
> string-manipulation. The `gets()` is a toy example where at most a `grep -rn
> 'gets('` could do better analysis than us.
>
> The real world looks like that:
> ```
> 1 encryptedpasswordlen = ((strlen(passwd) + RADIUS_VECTOR_LENGTH - 1)
> / RADIUS_VECTOR_LENGTH) * RADIUS_VECTOR_LENGTH;
> 2 cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH);
> 3 memcpy(cryptvector, secret, strlen(secret));
> ...
> 4 for (i = 0; i < encryptedpasswordlen; i += RADIUS_VECTOR_LENGTH) {
> 5 memcpy(cryptvector + strlen(secret), md5trailer,
> RADIUS_VECTOR_LENGTH);
> ...
> ```
> - from `postgresql/src/backend/libpq/auth.c`
>
> At `3` we would emit a warning, because the null-termination left by the
> wrong size of the string, but at `5` we see that, it left, because at that
> offset the string continues, and dunno, on `6` when we model every
> flow-sensitive information, the string left non-terminated.
>
> Of course each of that stuff is local and AST-based (with huge overhead of
> rebindings and impossible false-positive suppression), but when you have two
> of it, that is when the fun begins.
> if you manage to employ use-def chains for this problem, that might be quite
> an inspiring start.
We have regions so we do not need to rely on such chains in the AST-world, if I
get your idea right by the "Use-define chain" wiki [1]. Btw. it is not that
difficult problem in the AST-world, you need to create a recursive AST-matcher
on the `DeclRefExpr` with `std::function`.
Basically, I want to implement all the STR rules in a logical order, here is
one of the examples from STR32-C [2] which is my last planned project at the
moment:
```
void lessen_memory_usage(void) {
wchar_t *temp;
size_t temp_size;
/* ... */
if (cur_msg != NULL) {
temp_size = cur_msg_size / 2 + 1;
temp = realloc(cur_msg, temp_size * sizeof(wchar_t));
/* temp &and cur_msg may no longer be null-terminated */
if (temp == NULL) {
/* Handle error */
}
cur_msg = temp;
cur_msg_size = temp_size;
cur_msg_len = wcslen(cur_msg);
}
}
```
They really want to represent the wild, and please think of that problem in
terms of an AST-checker versus in terms of `getAsRegion` and
`getDynamicElementCount` to compare the size of the allocated memory block and
inject that: `cur_msg[temp_size - 1] = L'\0';` because the array would
overflow. How cool is the Analyzer and how smart to do so.
It would took at most 10 minutes to implement if the `evalBinOp` would work or
the main 10 years old implementation of obtaining the element-count would work.
I am on the way to fixing the latter, but it will be more path-sensitive info,
than you could imagine, like reusing the "zombie" size-expression turned out to
be a hard problem. And it will be a lot easier to solve such problems, I
believe.
The local scope is the key, and that checker at the moment only tries to
rewrite destination-arrays which are local. I think we could see if we emit
multiple reports on a given call so due to ambiguity we would drop such
fix-its. With an AST checker I could not imagine how difficult it would be.
It is rather a research at the moment, because I have encountered dozens of
silly stuff, beginning with the `getExtent()`, so I cannot say this direction
is the 100% future, but I have picked the Analyzer over Tidy, for that reason,
it is smarter.
[1] https://en.wikipedia.org/wiki/Use-define_chain
[2]
https://wiki.sei.cmu.edu/confluence/display/c/STR32-C.+Do+not+pass+a+non-null-terminated+character+sequence+to+a+library+function+that+expects+a+string
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D69813/new/
https://reviews.llvm.org/D69813
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits