steakhal added a comment.
Looks even better. Only minor concerns remained, mostly about style and
suggestions of llvm utilities.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:129-130
/// Given a pointer/reference argument, return the value it refers to.
-std::optional<SVal> getPointeeOf(const CheckerContext &C, SVal Arg) {
+std::optional<SVal> getPointeeOf(ASTContext &ASTCtx,
+ const ProgramStateRef State, SVal Arg) {
if (auto LValue = Arg.getAs<Loc>())
----------------
BTW I don't know but `State->getStateManager().getContext()` can give you an
`ASTContext`. And we tend to not put `const` to variable declarations. See [[
https://releases.llvm.org/4.0.0/tools/clang/tools/extra/docs/clang-tidy/checks/readability-avoid-const-params-in-decls.html
| readability-avoid-const-params-in-decls ]]
In other places we tend to refer to `ASTContext` by the `ACtx` I think.
We also prefer const refs over mutable refs. Is the mutable ref justified for
this case?
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:159
+/// when the return value, or the outgoing parameters are tainted.
+const NoteTag *taintOriginTrackerTag(CheckerContext &C,
+ std::vector<SymbolRef> TaintedSymbols,
----------------
My bad. In LLVM style we use `UpperCamelCase` for variable names.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:173-175
+ if (TaintedSymbols.empty()) {
+ return "Taint originated here";
+ }
----------------
Generally, in LLVM style we don't put braces to single block statements unless
it would hurt readability, which I don't think applies here.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:176-180
+ for (auto Sym : llvm::enumerate(TaintedSymbols)) {
+ LLVM_DEBUG(llvm::dbgs() << "Taint Propagated from argument "
+ << TaintedArgs.at(Sym.index()) + 1 << "\n");
+ BR.markInteresting(Sym.value());
+ }
----------------
I was also bad with this recommendation.
I think we can now use structured bindings to get the index and value right
there, like:
`for (auto [Idx, Sym] : llvm::enumerate(TaintedSymbols))`
[[
https://github.com/llvm/llvm-project/blob/main/llvm/docs/ProgrammersManual.rst#enumerate
| See ]]
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:203
+ int nofTaintedArgs = 0;
+ for (auto Sym : llvm::enumerate(TaintedSymbols)) {
+ if (BR.isInteresting(Sym.value())) {
----------------
Same here about structured bindings.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:209-214
+ if (nofTaintedArgs == 0)
+ Out << "Taint propagated to argument "
+ << TaintedArgs.at(Sym.index()) + 1;
+ else
+ Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+ nofTaintedArgs++;
----------------
I'd recommend using `llvm::interleaveComma()` in such cases.
You can probably get rid of `nofTaintedArgs` as well - by using this function.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:211
+ Out << "Taint propagated to argument "
+ << TaintedArgs.at(Sym.index()) + 1;
+ else
----------------
For clang diagnostics we usually use ordinary suffixes like `{st,nd,rd,th}`. It
would be nice to align with the rest of the clang diagnostics on this.
It would require a bit of work on the wording though, I admit.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:213
+ else
+ Out << ", " << TaintedArgs.at(Sym.index()) + 1;
+ nofTaintedArgs++;
----------------
I believe this branch is uncovered by tests.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:221
+ }
+ return std::string(Out.str());
+ });
----------------
I think since you explicitly specify the return type of the lambda, you could
omit the spelling of `std::string` here.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-864
State = addTaint(State, Call.getReturnValue());
+ SymbolRef TaintedSym = isTainted(State, Call.getReturnValue());
+ if (TaintedSym) {
+ TaintedSymbols.push_back(TaintedSym);
----------------
We tend to fuse such declarations:
I've seen other cases like this elsewhere. Please check.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-878
+ // FIXME: The call argument may be a complex expression
+ // referring to multiple tainted variables.
+ // Now we generate notes and track back only one of them.
+ SymbolRef TaintedSym = isTainted(State, *V);
----------------
You could iterate over the symbol dependencies of the SymExpr (of the `*V`
SVal).
```lang=c++
SymbolRef PointeeAsSym = V->getAsSymbol();
// eee, can it be null? Sure it can. See isTainted(Region),... for those cases
we would need to descend and check their symbol dependencies.
for (SymbolRef SubSym : llvm::make_range(PointeeAsSym->symbol_begin(),
PointeeAsSym->symbol_end())) {
// TODO: check each if it's also tainted, and update the `TaintedSymbols`
accordingly, IDK.
}
```
Something like this should work for most cases (except when `*V` refers to a
tainted region instead of a symbol), I think.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:162
+ const CallEvent& Call) {
+ const LocationContext* LC = Call.getCalleeStackFrame(0);
+
----------------
dkrupp wrote:
> steakhal wrote:
> > If the `Call` parameter is only used for acquiring the `LocationContext`,
> > wouldn't it be more descriptive to directly pass the `LocationContext` to
> > the function instead?
> > I'm also puzzled that we use `getCalleeStackFrame` here. I rarely ever see
> > this function, so I'm a bit worried if this pick was intentional. That we
> > pass the `0` as the `BlockCount` argument only reinforces this instinct.
> The call.getCalleeStackFrame(0) gets the location context of the actual call
> that we are analyzing (in the pre or postcall), and that's what we need to
> mark interesting. It is intentionally used like this. I changed the parameter
> to locationcontext as use suggested.
Okay.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:204
// If this is a SymbolDerived with a tainted parent, it's also tainted.
- if (isTainted(State, SD->getParentSymbol(), Kind))
- return true;
+ if (SymbolRef TSR = isTainted(State, SD->getParentSymbol(), Kind))
+ return TSR;
----------------
Here we still have the `TSR` token.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/VLASizeChecker.cpp:220-228
+ if (Kind == VLA_Tainted)
+ BT.reset(new BugType(this,
+ "Dangerous variable-length array (VLA) declaration",
+ categories::TaintedData));
+ else
+ BT.reset(new BugType(this,
+ "Dangerous variable-length array (VLA) declaration",
----------------
dkrupp wrote:
> steakhal wrote:
> > Why don't we use a distinct BugType for this?
> You mean a new bug type instances? Would there be an advantage for that?
> Seemed to be simpler this way. To distinguish identify the tainted reports
> with the bug category.
> You mean a new bug type instances? Would there be an advantage for that?
> Seemed to be simpler this way. To distinguish identify the tainted reports
> with the bug category.
I never checked how `BugTypes` constitute to bugreport construction, but my gut
instinct suggests that we should have two separate instances like we frequently
do for other checkers.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D144269/new/
https://reviews.llvm.org/D144269
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits