[PATCH] D30909: [Analyzer] Finish taint propagation to derived symbols of tainted regions

Artem Dergachev via Phabricator via cfe-commits Tue, 18 Apr 2017 06:55:58 -0700

NoQ added inline comments.


================
Comment at: lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:494
+  SymbolManager &SM = C.getSymbolManager();
+  return SM.getDerivedSymbol(Sym, LCV.getRegion());
 }
----------------
vlad.tsyrklevich wrote:
> NoQ wrote:
> > vlad.tsyrklevich wrote:
> > > NoQ wrote:
> > > > I'd think about this a bit more and come back.
> > > > 
> > > > I need to understand how come that constructing a symbol manually is 
> > > > the right thing to do; that doesn't happen very often, but it seems 
> > > > correct here.
> > > Indeed it is odd. The best justification I could come up with: LCVs are 
> > > meant to be optimizations, their 'purpose' is to expose an SVal that 
> > > hides SymbolRef values so that we can have a split store. We don't have 
> > > to copy all of a compound values SymbolRef mappings because LCVs are kept 
> > > distinct. Hence to set/query/constrain region values you use SVals so 
> > > that RegionStore can differentiate between LCVs and SymbolRef backed 
> > > SVals for the two different stores it contains.
> > > 
> > > The taint interface however requires you taint a SymbolRef, not an SVal. 
> > > If we wanted, instead of doing this logic here, we could change 
> > > getPointedToSymbol() to return an SVal and update usages of it 
> > > accordingly since that value is only passed on to 
> > > ProgramState.isTainted()/ProgramState.addTaint() anyway. Then we could 
> > > update addTaint/isTainted to perform this logic, hiding it from the 
> > > checker.
> > > 
> > > This still requires manually constructing a symbol, now it's just 
> > > performed in the analyzer instead of in a checker. Not sure if that 
> > > addresses the issue you were considering, but the idea that we need to 
> > > 'undo' the LCV optimization hiding the SymbolRef to have a value to taint 
> > > seems somewhat convincing to me. What do you think?
> > Hmm (!) I suggest adding a new function to the program state, that we'd 
> > call `addPartialTaint()` or something like that, and this function would 
> > accept a symbol and a region and would act identically to passing a derived 
> > symbol (from this symbol and that region) to `addTaint()` (but we wouldn't 
> > need to actually construct a derived symbol here).
> > 
> > Such API would be easier to understand and use than the current approach 
> > that forces the user to construct a derived symbol manually in the checker 
> > code. Unfortunately, this checker's `getLCVSymbol()` would become a bit 
> > more complicated (having various return types depending on circumstances), 
> > but this misfortune seems more random than systematic to me.
> > 
> > Since we're having this new kind of partial taint, why don't we expose it 
> > in the API.
> I'm happy to implement it this way, but figured I'd ask why you prefer this 
> approach first in the interest of keeping the TaintChecker simple! The 
> optimal approach to me seems to be changing `getPointedToSymbol()` to 
> `getPointedToSVal()` and having `addTaint(SVal)` call `addPartialTaint()` 
> when it's passed an LCV sub-region. That way users of the taint interface 
> like the TaintChecker have a clean way to add & check regardless of whether 
> it's a SymbolRef or an LCV but the partial taint functionality is still 
> exposed and documented for those who might want to use it in new ways.
> 
> Just curious to understand your rationale. Thanks for the feedback!
Your idea actually looks good to me! I'd approve going this way.

With this change to `addTaint(SVal)`, i suspect it'd need some extra 
documentation to explain what it does now.


================
Comment at: lib/StaticAnalyzer/Core/ProgramState.cpp:676
+  if (const SymbolDerived *SD = dyn_cast<SymbolDerived>(Sym)) {
+    TaintedSymRegionsRef SymRegions(0, TSRFactory.getTreeFactory());
+
----------------
The `SymRegions` name is a bit confusing because we often shorten 
`SymbolicRegion` to `SymRegion` (eg. in dumps), which is not what we mean.


================
Comment at: lib/StaticAnalyzer/Core/ProgramState.cpp:682
+
+    SymRegions = SymRegions.add(SD->getRegion());
+    NewState = NewState->set<DerivedSymTaint>(SD->getParentSymbol(), 
SymRegions);
----------------
I wonder if it's worth it to check if a super-region of this region is already 
tainted, and avoid adding the region in this scenario.

I guess in practice it won't happen very often, because this code would most 
likely be executed just once per taint source. This probably deserves a comment 
though.


https://reviews.llvm.org/D30909



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D30909: [Analyzer] Finish taint propagation to derived symbols of tainted regions

Reply via email to