dkrupp added a comment. @steakhal your comments are fixed. Thanks for the review.
================ Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:108-109 + if ((stateNotZero && stateZero)) { + std::vector<SymbolRef> taintedSyms = getTaintedSymbols(C.getState(), *DV); + if (!taintedSyms.empty()) { + reportTaintBug("Division by a tainted value, possibly zero", stateZero, C, ---------------- steakhal wrote: > We cannot get rid off the getTaintedSymbols() call, as we need to pass all tainted symbols to reportTaintBug if we want to track back multiple variables. taintedSyms is a parameter of reportTaintBug(...) ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-868 + std::vector<SymbolRef> TaintedSyms = + getTaintedSymbols(State, Call.getReturnValue()); + if (!TaintedSyms.empty()) { + TaintedSymbols.push_back(TaintedSyms[0]); + TaintedIndexes.push_back(ArgNum); + } ---------------- steakhal wrote: > I think this suggested solution would not be correct here, as ArgSym might not be the actual _tainted_ symbol (inside a more complex expression). So I would prefer to leave it like this for correctness. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-879 + std::vector<SymbolRef> TaintedSyms = getTaintedSymbols(State, *V); + if (!TaintedSyms.empty()) { + TaintedSymbols.push_back(TaintedSyms[0]); + TaintedIndexes.push_back(ArgNum); + } ---------------- steakhal wrote: > In these cases, the code would acquire all the tainted subsymbols, which then > we throw away and keep only the first one. > This is why I suggested the approach I did I'm my last review. I think this suggested solution would not be correct here, as ArgSym might not be the actual _tainted_ symbol (inside a more complex expression). So I would prefer to leave it like this for correctness. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:294 + std::vector<SymbolRef> TaintedCasts = + getTaintedSymbols(State, SC->getOperand(), Kind); + TaintedSymbols.insert(TaintedSymbols.begin(), TaintedCasts.begin(), ---------------- steakhal wrote: > If `returnFirstOnly` is `true`, this `getTaintedSymbols()` call would still > eagerly (needlessly) collect all of the symbols. > I'd recommend propagating the `returnFirstOnly` parameter to the recursive > calls to avoid this problem. > I also encourage you to make use of the `llvm::append_range()` whenever makes > sense. You are perfectly right. I overlooked these calls and because of the the default parameter did got get a warning. now fixed. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:147-150 bool taint::isTainted(ProgramStateRef State, const Stmt *S, const LocationContext *LCtx, TaintTagType Kind) { - SVal val = State->getSVal(S, LCtx); - return isTainted(State, val, Kind); + return !getTaintedSymbols(State, S, LCtx, Kind).empty(); } ---------------- dkrupp wrote: > steakhal wrote: > > TBH I'm not sure if I like that now we allocate unbounded amount of times > > (because of `getTaintedSymbols()` is recursive and returns by value), where > > we previously did not. > > > > What we could possibly do is to compute the elements of this sequence > > lazily. > > I'm thinking of the `llvm::mapped_iterator`, but I'm not sure if it's > > possible to have something like that as a return type as it might encode > > the map function in the type or something like that. > > Anyway, I'm just saying that it would be nice to not do more than it's > > necessary, and especially not allocate a lot of short-lived objects there. > > > > Do you think there is a way to have the cake and eat it too? > > > > --- > > > > I did some investigation, and one could get pretty far in the > > implementation, and maybe even complete it but it would be really > > complicated as of now. Maybe we could revisit this subject when we have > > coroutines. > > > > So, I would suggest to have two sets of APIs: > > - the usual `isTainted(.) -> bool` > > - and a `getTaintedSymbols(.) -> vector<Sym>` > > The important point would be that the `isTainted()` version would not > > eagerly collect all tainted sub-syms but return on finding the first one. > > While, the `getTaintedSymbols()` would collect eagerly all of them, as its > > name suggests. > > > > Imagine if `getTaintedSymbolsImpl()` had an extra flag like `bool > > returnAfterFirstMatch`. This way `isTainted()` can call it like that. While > > in the other case, the parameter would be `false`, and eagerly collect all > > symbols. > > > > This is probably the best of both worlds, as it prevents `isTainted` from > > doing extra work and if we need to iterate over the tainted symbols, we > > always iterate over all of them, so doing it lazily wouldn't gain us much > > in that case anyway. > > As a bonus, the user-facing API would be self-descriptive. > > > > WDYT? > Good idea. I implemented the early return option in getTaintedSymbols(). This > is used now by the isTainted() function. First I wanted to avoid the getTaintedSymbols()->getTaintedSymbolsImpl() proxy calls as it is too bloated IMHO. But I see your point that it is safer. So I changed it. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D144269/new/ https://reviews.llvm.org/D144269 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits