[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

Daniel Krupp via Phabricator via cfe-commits Sat, 22 Apr 2023 08:34:09 -0700

dkrupp added a comment.

@steakhal your comments are fixed. Thanks for the review.




================
Comment at: clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp:108-109
+  if ((stateNotZero && stateZero)) {
+    std::vector<SymbolRef> taintedSyms = getTaintedSymbols(C.getState(), *DV);
+    if (!taintedSyms.empty()) {
+      reportTaintBug("Division by a tainted value, possibly zero", stateZero, 
C,
----------------
steakhal wrote:
> 
We cannot get rid off the getTaintedSymbols() call, as we need to pass all 
tainted symbols to reportTaintBug if we want to track back multiple variables. 
taintedSyms is a parameter of reportTaintBug(...)


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:863-868
+      std::vector<SymbolRef> TaintedSyms =
+          getTaintedSymbols(State, Call.getReturnValue());
+      if (!TaintedSyms.empty()) {
+        TaintedSymbols.push_back(TaintedSyms[0]);
+        TaintedIndexes.push_back(ArgNum);
+      }
----------------
steakhal wrote:
> 
I think this suggested solution would not be correct here, as ArgSym might not 
be the actual _tainted_ symbol (inside a more complex expression).

So I would prefer to leave it like this for correctness.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:875-879
+      std::vector<SymbolRef> TaintedSyms = getTaintedSymbols(State, *V);
+      if (!TaintedSyms.empty()) {
+        TaintedSymbols.push_back(TaintedSyms[0]);
+        TaintedIndexes.push_back(ArgNum);
+      }
----------------
steakhal wrote:
> In these cases, the code would acquire all the tainted subsymbols, which then 
> we throw away and keep only the first one.
> This is why I suggested the approach I did I'm my last review.
I think this suggested solution would not be correct here, as ArgSym might not 
be the actual _tainted_ symbol (inside a more complex expression).

So I would prefer to leave it like this for correctness.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:294
+      std::vector<SymbolRef> TaintedCasts =
+          getTaintedSymbols(State, SC->getOperand(), Kind);
+      TaintedSymbols.insert(TaintedSymbols.begin(), TaintedCasts.begin(),
----------------
steakhal wrote:
> If `returnFirstOnly` is `true`, this `getTaintedSymbols()` call would still 
> eagerly (needlessly) collect all of the symbols.
> I'd recommend propagating the `returnFirstOnly` parameter to the recursive 
> calls to avoid this problem.
> I also encourage you to make use of the `llvm::append_range()` whenever makes 
> sense.
You are perfectly right. I overlooked these calls and because of the the 
default parameter did got get a warning.  now fixed.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/Taint.cpp:147-150
 bool taint::isTainted(ProgramStateRef State, const Stmt *S,
                       const LocationContext *LCtx, TaintTagType Kind) {
-  SVal val = State->getSVal(S, LCtx);
-  return isTainted(State, val, Kind);
+  return !getTaintedSymbols(State, S, LCtx, Kind).empty();
 }
----------------
dkrupp wrote:
> steakhal wrote:
> > TBH I'm not sure if I like that now we allocate unbounded amount of times 
> > (because of `getTaintedSymbols()` is recursive and returns by value), where 
> > we previously did not.
> > 
> > What we could possibly do is to compute the elements of this sequence 
> > lazily.
> > I'm thinking of the `llvm::mapped_iterator`, but I'm not sure if it's 
> > possible to have something like that as a return type as it might encode 
> > the map function in the type or something like that.
> > Anyway, I'm just saying that it would be nice to not do more than it's 
> > necessary, and especially not allocate a lot of short-lived objects there.
> > 
> > Do you think there is a way to have the cake and eat it too?
> > 
> > ---
> > 
> > I did some investigation, and one could get pretty far in the 
> > implementation, and maybe even complete it but it would be really 
> > complicated as of now. Maybe we could revisit this subject when we have 
> > coroutines.
> > 
> > So, I would suggest to have two sets of APIs:
> >  - the usual `isTainted(.) -> bool`
> >  - and a `getTaintedSymbols(.) -> vector<Sym>`
> > The important point would be that the `isTainted()` version would not 
> > eagerly collect all tainted sub-syms but return on finding the first one.
> > While, the `getTaintedSymbols()` would collect eagerly all of them, as its 
> > name suggests.
> > 
> > Imagine if `getTaintedSymbolsImpl()` had an extra flag like `bool 
> > returnAfterFirstMatch`. This way `isTainted()` can call it like that. While 
> > in the other case, the parameter would be `false`, and eagerly collect all 
> > symbols.
> > 
> > This is probably the best of both worlds, as it prevents `isTainted` from 
> > doing extra work and if we need to iterate over the tainted symbols, we 
> > always iterate over all of them, so doing it lazily wouldn't gain us much 
> > in that case anyway.
> > As a bonus, the user-facing API would be self-descriptive.
> > 
> > WDYT?
> Good idea. I implemented the early return option in getTaintedSymbols(). This 
> is used now by the isTainted() function.
First I wanted to avoid the getTaintedSymbols()->getTaintedSymbolsImpl() proxy 
calls as it is too bloated IMHO.
But I see your point that it is safer. So I changed it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144269/new/

https://reviews.llvm.org/D144269

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D144269: [Analyzer] Show "taint originated here" note of alpha.security.taint.TaintPropagation checker at the correct place

Reply via email to