Charusso added inline comments.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:201-207
+void CERTStrChecker::evalGets(const CallEvent &Call, const CallContext &CallC,
+ CheckerContext &C) const {
+ unsigned DestPos = *CallC.DestinationPos;
+ const Expr *DestArg = Call.getArgExpr(DestPos)->IgnoreImpCasts();
+ SVal DestV = Call.getArgSVal(DestPos);
+
+ auto Report = getReport(*BT, Call, CallC, C);
----------------
NoQ wrote:
> NoQ wrote:
> > All right, so basically what you're saying is that literally every
> > invocation of `gets()` deserves a warning. This means that for all
> > practical purposes your checker //is// an AST-based checker, just
> > implemented with path-sensitive callbacks. A path-sensitive checker emits
> > warnings based on multiple events that happen sequentially along the path
> > (use-after-free: "memory deallocated - that same memory used", division by
> > zero: "value constrained to zero - something is being divided by that same
> > value", etc.) but your checker emits the warning by looking at only one
> > statement: "`gets()` is invoked".
> >
> > Do i understand correctly that your plan is to use path-sensitive analysis
> > for fixits only? But you can't emit fixits for any truly path-sensitive
> > warning anyway. Fixits must work correctly on all execution paths, so you
> > cannot emit a correct fixit by looking at only one execution path. In order
> > to emit fixits, you need to either resort to a pure AST check anyway ("this
> > expression refers to an array of fixed size"), or maybe implement auxiliary
> > data-flow analysis for a certain must-problem (eg., "the buffer argument
> > may have exactly one possible value across all paths that reach `gets()`").
> > But in both cases the path-sensitive engine does literally nothing to help
> > you; all the data that you'll need for your fixit will be available from
> > the AST.
> Like, i think this was an interesting investigation and i was genuinely
> curious about how this turns out to be, but for now it seems that the problem
> you're trying to solve cannot be solved this way. Path sensitive analysis is
> fundamentally applicable to only 50% of the problems (to "may-problems" but
> not to "must-problems"), and the problem you're trying to solve is in the
> latter category. I believe you'll have to fall back to the relatively boring
> task of adding fixits to `security.insecureAPI.gets`; but then, again, if you
> manage to employ use-def chains for this problem, that might be quite an
> inspiring start.
Most of the time the given allocation to hold the arbitrary string happens in a
local scope. After that we see `fscanf(dst)`, `gets(dst)`, `memcpy(dst)`,
`strncpy(dst)`, stuff... which pushes new data into that memory block, and then
the cool developers write that down: `dst[42] = '\0'` which means all the
reports should be thrown away in a path-sensitive manner on `dst`.
Reallocations, re-bindings, non-AST stuff could handled very easily with the
non-AST checker, like that one.
Sometimes we are work with destination-array like `memcpy(Foo[Bar.Baz]->Qux,
...)` which could not really handled with just a simple AST-based checker. I
could not say at the moment we could handle it with symbols, but we have a much
larger scope of information by symbols.
Most of the time because of the Analyzer is much smarter than the Tidy we could
emit fix-its with the help of flow-sensitiveness very easily. I would create
huge white-lists what we want to fix-it, and what we could not, but at some
point if we model the symbols better, we can.
Other than that easy false-positive suppression and tiny flow-sensitive
rebinding stuff, we could be sure what is going on by each string-manipulation.
The `gets()` is a toy example where at most a `grep -rn 'gets('` could do
better analysis than us.
The real world looks like that:
```
1 encryptedpasswordlen = ((strlen(passwd) + RADIUS_VECTOR_LENGTH - 1)
/ RADIUS_VECTOR_LENGTH) * RADIUS_VECTOR_LENGTH;
2 cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH);
3 memcpy(cryptvector, secret, strlen(secret));
...
4 for (i = 0; i < encryptedpasswordlen; i += RADIUS_VECTOR_LENGTH) {
5 memcpy(cryptvector + strlen(secret), md5trailer,
RADIUS_VECTOR_LENGTH);
...
```
- from `postgresql/src/backend/libpq/auth.c`
At `3` we would emit a warning, because the null-termination left by the wrong
size of the string, but at `5` we see that, it left, because at that offset the
string continues, and dunno, on `6` when we model every flow-sensitive
information, the string left non-terminated.
Of course each of that stuff is local and AST-based (with huge overhead of
rebindings and impossible false-positive suppression), but when you have two of
it, that is when the fun begins.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D69813/new/
https://reviews.llvm.org/D69813
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits