Charusso added inline comments.
================ Comment at: clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp:201-207 +void CERTStrChecker::evalGets(const CallEvent &Call, const CallContext &CallC, + CheckerContext &C) const { + unsigned DestPos = *CallC.DestinationPos; + const Expr *DestArg = Call.getArgExpr(DestPos)->IgnoreImpCasts(); + SVal DestV = Call.getArgSVal(DestPos); + + auto Report = getReport(*BT, Call, CallC, C); ---------------- Charusso wrote: > NoQ wrote: > > NoQ wrote: > > > All right, so basically what you're saying is that literally every > > > invocation of `gets()` deserves a warning. This means that for all > > > practical purposes your checker //is// an AST-based checker, just > > > implemented with path-sensitive callbacks. A path-sensitive checker emits > > > warnings based on multiple events that happen sequentially along the path > > > (use-after-free: "memory deallocated - that same memory used", division > > > by zero: "value constrained to zero - something is being divided by that > > > same value", etc.) but your checker emits the warning by looking at only > > > one statement: "`gets()` is invoked". > > > > > > Do i understand correctly that your plan is to use path-sensitive > > > analysis for fixits only? But you can't emit fixits for any truly > > > path-sensitive warning anyway. Fixits must work correctly on all > > > execution paths, so you cannot emit a correct fixit by looking at only > > > one execution path. In order to emit fixits, you need to either resort to > > > a pure AST check anyway ("this expression refers to an array of fixed > > > size"), or maybe implement auxiliary data-flow analysis for a certain > > > must-problem (eg., "the buffer argument may have exactly one possible > > > value across all paths that reach `gets()`"). But in both cases the > > > path-sensitive engine does literally nothing to help you; all the data > > > that you'll need for your fixit will be available from the AST. > > Like, i think this was an interesting investigation and i was genuinely > > curious about how this turns out to be, but for now it seems that the > > problem you're trying to solve cannot be solved this way. Path sensitive > > analysis is fundamentally applicable to only 50% of the problems (to > > "may-problems" but not to "must-problems"), and the problem you're trying > > to solve is in the latter category. I believe you'll have to fall back to > > the relatively boring task of adding fixits to `security.insecureAPI.gets`; > > but then, again, if you manage to employ use-def chains for this problem, > > that might be quite an inspiring start. > Most of the time the given allocation to hold the arbitrary string happens in > a local scope. After that we see `fscanf(dst)`, `gets(dst)`, `memcpy(dst)`, > `strncpy(dst)`, stuff... which pushes new data into that memory block, and > then the cool developers write that down: `dst[42] = '\0'` which means all > the reports should be thrown away in a path-sensitive manner on `dst`. > Reallocations, re-bindings, non-AST stuff could handled very easily with the > non-AST checker, like that one. > > Sometimes we are work with destination-array like `memcpy(Foo[Bar.Baz]->Qux, > ...)` which could not really handled with just a simple AST-based checker. I > could not say at the moment we could handle it with symbols, but we have a > much larger scope of information by symbols. > > Most of the time because of the Analyzer is much smarter than the Tidy we > could emit fix-its with the help of flow-sensitiveness very easily. I would > create huge white-lists what we want to fix-it, and what we could not, but at > some point if we model the symbols better, we can. > > Other than that easy false-positive suppression and tiny flow-sensitive > rebinding stuff, we could be sure what is going on by each > string-manipulation. The `gets()` is a toy example where at most a `grep -rn > 'gets('` could do better analysis than us. > > The real world looks like that: > ``` > 1 encryptedpasswordlen = ((strlen(passwd) + RADIUS_VECTOR_LENGTH - 1) > / RADIUS_VECTOR_LENGTH) * RADIUS_VECTOR_LENGTH; > 2 cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH); > 3 memcpy(cryptvector, secret, strlen(secret)); > ... > 4 for (i = 0; i < encryptedpasswordlen; i += RADIUS_VECTOR_LENGTH) { > 5 memcpy(cryptvector + strlen(secret), md5trailer, > RADIUS_VECTOR_LENGTH); > ... > ``` > - from `postgresql/src/backend/libpq/auth.c` > > At `3` we would emit a warning, because the null-termination left by the > wrong size of the string, but at `5` we see that, it left, because at that > offset the string continues, and dunno, on `6` when we model every > flow-sensitive information, the string left non-terminated. > > Of course each of that stuff is local and AST-based (with huge overhead of > rebindings and impossible false-positive suppression), but when you have two > of it, that is when the fun begins. > if you manage to employ use-def chains for this problem, that might be quite > an inspiring start. We have regions so we do not need to rely on such chains in the AST-world, if I get your idea right by the "Use-define chain" wiki [1]. Btw. it is not that difficult problem in the AST-world, you need to create a recursive AST-matcher on the `DeclRefExpr` with `std::function`. Basically, I want to implement all the STR rules in a logical order, here is one of the examples from STR32-C [2] which is my last planned project at the moment: ``` void lessen_memory_usage(void) { wchar_t *temp; size_t temp_size; /* ... */ if (cur_msg != NULL) { temp_size = cur_msg_size / 2 + 1; temp = realloc(cur_msg, temp_size * sizeof(wchar_t)); /* temp &and cur_msg may no longer be null-terminated */ if (temp == NULL) { /* Handle error */ } cur_msg = temp; cur_msg_size = temp_size; cur_msg_len = wcslen(cur_msg); } } ``` They really want to represent the wild, and please think of that problem in terms of an AST-checker versus in terms of `getAsRegion` and `getDynamicElementCount` to compare the size of the allocated memory block and inject that: `cur_msg[temp_size - 1] = L'\0';` because the array would overflow. How cool is the Analyzer and how smart to do so. It would took at most 10 minutes to implement if the `evalBinOp` would work or the main 10 years old implementation of obtaining the element-count would work. I am on the way to fixing the latter, but it will be more path-sensitive info, than you could imagine, like reusing the "zombie" size-expression turned out to be a hard problem. And it will be a lot easier to solve such problems, I believe. The local scope is the key, and that checker at the moment only tries to rewrite destination-arrays which are local. I think we could see if we emit multiple reports on a given call so due to ambiguity we would drop such fix-its. With an AST checker I could not imagine how difficult it would be. It is rather a research at the moment, because I have encountered dozens of silly stuff, beginning with the `getExtent()`, so I cannot say this direction is the 100% future, but I have picked the Analyzer over Tidy, for that reason, it is smarter. [1] https://en.wikipedia.org/wiki/Use-define_chain [2] https://wiki.sei.cmu.edu/confluence/display/c/STR32-C.+Do+not+pass+a+non-null-terminated+character+sequence+to+a+library+function+that+expects+a+string CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69813/new/ https://reviews.llvm.org/D69813 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits