dkrupp added a comment.
@steakhal your comments are fixed. Thanks for the review.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:108-109
+ if ((stateNotZero && stateZero)) {
+ std::vector<SymbolRef> taintedSyms = getTaintedSymbols(C.getState(), *DV);
+ if (!taintedSyms.empty()) {
+ reportTaintBug("Division by a tainted value, possibly zero", stateZero,
C,
----------------
steakhal wrote:
>
We cannot get rid off the getTaintedSymbols() call, as we need to pass all
tainted symbols to reportTaintBug if we want to track back multiple variables.
taintedSyms is a parameter of reportTaintBug(...)
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-868
+ std::vector<SymbolRef> TaintedSyms =
+ getTaintedSymbols(State, Call.getReturnValue());
+ if (!TaintedSyms.empty()) {
+ TaintedSymbols.push_back(TaintedSyms[0]);
+ TaintedIndexes.push_back(ArgNum);
+ }
----------------
steakhal wrote:
>
I think this suggested solution would not be correct here, as ArgSym might not
be the actual _tainted_ symbol (inside a more complex expression).
So I would prefer to leave it like this for correctness.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-879
+ std::vector<SymbolRef> TaintedSyms = getTaintedSymbols(State, *V);
+ if (!TaintedSyms.empty()) {
+ TaintedSymbols.push_back(TaintedSyms[0]);
+ TaintedIndexes.push_back(ArgNum);
+ }
----------------
steakhal wrote:
> In these cases, the code would acquire all the tainted subsymbols, which then
> we throw away and keep only the first one.
> This is why I suggested the approach I did I'm my last review.
I think this suggested solution would not be correct here, as ArgSym might not
be the actual _tainted_ symbol (inside a more complex expression).
So I would prefer to leave it like this for correctness.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:294
+ std::vector<SymbolRef> TaintedCasts =
+ getTaintedSymbols(State, SC->getOperand(), Kind);
+ TaintedSymbols.insert(TaintedSymbols.begin(), TaintedCasts.begin(),
----------------
steakhal wrote:
> If `returnFirstOnly` is `true`, this `getTaintedSymbols()` call would still
> eagerly (needlessly) collect all of the symbols.
> I'd recommend propagating the `returnFirstOnly` parameter to the recursive
> calls to avoid this problem.
> I also encourage you to make use of the `llvm::append_range()` whenever makes
> sense.
You are perfectly right. I overlooked these calls and because of the the
default parameter did got get a warning. now fixed.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:147-150
bool taint::isTainted(ProgramStateRef State, const Stmt *S,
const LocationContext *LCtx, TaintTagType Kind) {
- SVal val = State->getSVal(S, LCtx);
- return isTainted(State, val, Kind);
+ return !getTaintedSymbols(State, S, LCtx, Kind).empty();
}
----------------
dkrupp wrote:
> steakhal wrote:
> > TBH I'm not sure if I like that now we allocate unbounded amount of times
> > (because of `getTaintedSymbols()` is recursive and returns by value), where
> > we previously did not.
> >
> > What we could possibly do is to compute the elements of this sequence
> > lazily.
> > I'm thinking of the `llvm::mapped_iterator`, but I'm not sure if it's
> > possible to have something like that as a return type as it might encode
> > the map function in the type or something like that.
> > Anyway, I'm just saying that it would be nice to not do more than it's
> > necessary, and especially not allocate a lot of short-lived objects there.
> >
> > Do you think there is a way to have the cake and eat it too?
> >
> > ---
> >
> > I did some investigation, and one could get pretty far in the
> > implementation, and maybe even complete it but it would be really
> > complicated as of now. Maybe we could revisit this subject when we have
> > coroutines.
> >
> > So, I would suggest to have two sets of APIs:
> > - the usual `isTainted(.) -> bool`
> > - and a `getTaintedSymbols(.) -> vector<Sym>`
> > The important point would be that the `isTainted()` version would not
> > eagerly collect all tainted sub-syms but return on finding the first one.
> > While, the `getTaintedSymbols()` would collect eagerly all of them, as its
> > name suggests.
> >
> > Imagine if `getTaintedSymbolsImpl()` had an extra flag like `bool
> > returnAfterFirstMatch`. This way `isTainted()` can call it like that. While
> > in the other case, the parameter would be `false`, and eagerly collect all
> > symbols.
> >
> > This is probably the best of both worlds, as it prevents `isTainted` from
> > doing extra work and if we need to iterate over the tainted symbols, we
> > always iterate over all of them, so doing it lazily wouldn't gain us much
> > in that case anyway.
> > As a bonus, the user-facing API would be self-descriptive.
> >
> > WDYT?
> Good idea. I implemented the early return option in getTaintedSymbols(). This
> is used now by the isTainted() function.
First I wanted to avoid the getTaintedSymbols()->getTaintedSymbolsImpl() proxy
calls as it is too bloated IMHO.
But I see your point that it is safer. So I changed it.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D144269/new/
https://reviews.llvm.org/D144269
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits