[PATCH] D153131: [clang analysis][thread-safety] Handle return-by-reference...

Aaron Puchert via Phabricator via cfe-commits Sun, 17 Sep 2023 15:15:01 -0700

aaronpuchert added a comment.

Sorry for letting this collect dust. I think it's a valuable addition, and 
looks pretty good, except that I think we should use the expected exit set 
instead of the entry set. These can be legitimately different for appropriately 
annotated functions, i.e. with `acquire_(shared_)capability` or 
`release_(shared_)capability`.


Apart from the bug mentioned earlier, which should now be fixed, this looks 
good on our code base. I have to admit a separate flag could be nice for 
migration, but it should be included by default in `-Wthread-safety-reference`. 
But I'm not sure if we need it.



================
Comment at: clang/include/clang/Basic/DiagnosticGroups.td:1046
 def ThreadSafetyPrecise    : DiagGroup<"thread-safety-precise">;
 def ThreadSafetyReference  : DiagGroup<"thread-safety-reference">;
+def ThreadSafetyReturn     : DiagGroup<"thread-safety-return">;
----------------
courbet wrote:
> aaronpuchert wrote:
> > courbet wrote:
> > > aaronpuchert wrote:
> > > > courbet wrote:
> > > > > aaronpuchert wrote:
> > > > > > Why not under `-Wthread-safety-reference`, as it's 
> > > > > > return-by-reference that you're warning on? This seems too small 
> > > > > > for a separate flag to me.
> > > > > The main reason it so that we provide a soft transition period for 
> > > > > users: If we put that in `-Wthread-safety-reference`, we'll start 
> > > > > breaking compile for people who use `-Werror`, while a separate flag 
> > > > > allows a transition period where people opt into the new feature.
> > > > Transition flags can end up resulting in more churn, assuming that we 
> > > > eventually want to put this under `-Wthread-safety-reference`, because 
> > > > then you have two categories of users:
> > > > * those that opted in will eventually have to remove the flag again, and
> > > > * those that didn't will get hard errors on updating the compiler at 
> > > > that point.
> > > > Of course you might argue that the latter case can be prevented by 
> > > > carefully reading the release notes, but we know how often that happens.
> > > > 
> > > > I'd argue that if you're using `-Wthread-safety-reference`, you're 
> > > > already opting into warnings on escaping references, and not warning on 
> > > > `return` is a false negative.
> > > > 
> > > > A separate flag would make sense to me if we want to keep it, for 
> > > > example because this produces a substantial amount of false positives 
> > > > under some circumstances. Did you try this on a larger code base that's 
> > > > using the annotations? I could try it on our code, and maybe we can get 
> > > > some Googler to test it on theirs, which is also heavily using Thread 
> > > > Safety Analysis. (Though I'm not sure whether they use 
> > > > `-Wthread-safety-reference`.)
> > > I don't have a strong opinion for where the warning should go.  We are 
> > > indeed using `-Wthread-safety-reference`, though we're not enabling 
> > > -Werror on these, so adding more warnings is fine for us.
> > > 
> > > I've run the check on a small sample of our codebase (which I don;t claim 
> > > to be representative, I can do a larger analysis if needed). The warnings 
> > > are more or less evenly split between missing annotations and actual 
> > > bugs. I don't think any of the things I've seen qualify as false 
> > > positives.
> > > 
> > > Among the missing annotations, most of the warnings are missing 
> > > `ABSL_EXCLUSIVE_LOCKS_REQUIRED` on the function. In a small number of 
> > > cases, the pattern is that a variable is lazily initialized under a lock 
> > > and then returned by reference:
> > > 
> > > ```
> > > struct LazyImmutableThing {
> > >   const Thing& Get() {
> > >     {
> > >       MutexLock lock(&mutex_);
> > >       thing_->Initialize();
> > >     }
> > >     
> > >     return thing_;
> > >   }
> > >   
> > >   Mutex mutex_;
> > >   Thing thing_ GUARDED_BY(mutex_);
> > > };
> > > ```
> > > 
> > > I consider this to be a missing annotation as the returned value is 
> > > dynamically immutable, the proper fix would be `return 
> > > TS_UNCHECKED_READ(thing_)`.
> > > 
> > > 
> > > Most actual bugs are along the lines of:
> > > 
> > > ```
> > > struct S {
> > >   T& Get() const {
> > >     MutexLock lock(&mutex_);
> > >     return obj_;
> > >   }
> > > 
> > >   Mutex mutex_;
> > >   T obj_ GUARDED_BY(mutex_);
> > > };
> > > ```
> > > 
> > > though some are missing the guard altogether (`T& Get() const { return 
> > > obj_; }`).
> > > 
> > > There are a few possible fixes. In rough order of occurrence:
> > >  - Return by value as the copy is not too expensive and memory ordering 
> > > does not matter.
> > >  - Let the caller take the lock and annotate with 
> > > `ABSL_EXCLUSIVE_LOCKS_REQUIRED` when the `Get` method is not called too 
> > > often.
> > >  - Let `Get` take a callback and process the value under the lock instead 
> > > of returning it (when ordering matters).
> > > 
> > > In a small number of cases, the pattern is that a variable is lazily 
> > > initialized under a lock and then returned by reference:
> > 
> > I wonder why that's safe, is the initialization guarded to happen only 
> > once? Some kind of double-checked locking pattern perhaps? Otherwise it 
> > seems that reads could happen in parallel to writes. If it's a checked 
> > initialization, then I think the proper way to model this is:
> > * The initialization acquires a lock to exclude other initializations 
> > running in parallel. Reads cannot happen, because the reference has not yet 
> > escaped.
> > * After initialization, we essentially acquire an implicit shared lock. 
> > This is not tracked as a proper lock, but it doesn't need to: there are no 
> > more writes until the end of lifetime, so nobody will acquire another 
> > exclusive lock.
> > One could model this by creating a mutex wrapper that can be locked once in 
> > exclusive mode, and after that hands out shared locks to everybody who 
> > wants one without keeping track. (As this is slightly off-topic, we don't 
> > need to discuss this here though.)
> > 
> > Other than than, this matches what I'm seeing in our code.
> > I wonder why that's safe, is the initialization guarded to happen only 
> > once? Some kind of double-checked locking pattern perhaps? 
> 
> Yes, it looks like this:
> 
> ```
> const T& get() const {
>     if (!value_set_.load(std::memory_order_acquire)) {
>       MutexLock lock(&lock_);
>       if (!value_set_.load(std::memory_order_relaxed)) {
>         value_ = ComputeValue();
>         value_set_.store(true, std::memory_order_release);
>       }
>     }
>     return value_;
>   }
> ```
> 
> I ended up silencing the return with a comment:
> 
> ```
> // `value_` is set once an for all, it won't change after this function 
> returns.
>  return ABSL_TS_UNCHECKED_READ(value_);
> ```
> 
> I agree that this is not very principled, but it is simple :)
> 
> > One could model this by creating a mutex wrapper that can be locked once in 
> > exclusive mode, and after that hands out shared locks to everybody who 
> > wants one without keeping track. 
> 
> This is a cool idea. If I understand correctly, it does mean that the 
> *caller* of `get` has to grab a untracked shared lock ? 
This looks like the [Double-checked 
locking](https://en.wikipedia.org/wiki/Double-checked_locking) pattern. I've 
come to the conclusion that `guarded_by` annotations are not appropriate for 
them (just copying what I wrote to someone internally):

* The initialization flag (here `value_set_`) is read before acquiring the 
mutex, which means it can't be protected by the mutex.
* The content (here `value_`), if already initialized, is also read without 
acquiring the mutex, so it can't be protected by it. The content, if only 
readable, is "protected" by atomic ordering: the initialization must have a 
release barrier on (or before) writing the initialization flag, which 
synchronizes with an acquire barrier on (or after) the initial read of the 
initialization flag. If the content is writable, it will need to be 
synchronized on its own.
* The mutex might protect data being used in the initialization. Other than 
that it's just a performance optimization: having multiple threads initialize 
the same object would be wasteful. Especially since the pattern makes only 
sense for expensive initialization. (Expensive either in time or memory 
consumption.)
* Lastly, this pattern is usually used in a single function, so there is no 
risk of forgetting the mutex.


================
Comment at: clang/lib/Analysis/ThreadSafety.cpp:44
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallVector.h"
----------------
I wonder where we're using that, is this the leftover of an earlier version?


================
Comment at: clang/lib/Analysis/ThreadSafety.cpp:1248
   if (const auto *P = dyn_cast<til::Project>(SExp)) {
-    if (!CurrentMethod)
+    if (CurrentFunction == nullptr || !isa<CXXMethodDecl>(CurrentFunction))
       return false;
----------------



================
Comment at: clang/lib/Analysis/ThreadSafety.cpp:1546-1547
   FactSet FSet;
+  // The fact set for the function (i.e., its entry block).
+  const FactSet &FunctionFSet;
   /// Maps constructed objects to `this` placeholder prior to initialization.
----------------
Shouldn't it be the (expected) exit set if we're talking about `return`? Also 
I'd suggest `(Function)EntryFSet` (or with `Exit`).


================
Comment at: clang/lib/Analysis/ThreadSafety.cpp:2298
   if (!SortedGraph->empty() && D->hasAttrs()) {
-    const CFGBlock *FirstBlock = *SortedGraph->begin();
-    FactSet &InitialLockset = BlockInfo[FirstBlock->getBlockID()].EntrySet;
----------------
Maybe it makes sense to keep an assertion here like 
`assert(*SortedGraph->begin() == &CFGraph->getEntry());`.


================
Comment at: clang/lib/Analysis/ThreadSafety.cpp:2309
   // Mark entry block as reachable
   BlockInfo[CFGraph->getEntry().getBlockID()].Reachable = true;
 
----------------



================
Comment at: clang/lib/Analysis/ThreadSafety.cpp:2501-2519
   // By default, we expect all locks held on entry to be held on exit.
   FactSet ExpectedExitSet = Initial->EntrySet;
 
   // Adjust the expected exit set by adding or removing locks, as declared
   // by *-LOCK_FUNCTION and UNLOCK_FUNCTION.  The intersect below will then
   // issue the appropriate warning.
   // FIXME: the location here is not quite right.
----------------
Here we build the `ExpectedExitSet`. You might have to move this if we're using 
it earlier.


================
Comment at: clang/test/SemaCXX/warn-thread-safety-analysis.cpp:5630
+
+};
+
----------------
For the entry/exit set issue, can you add a function that acquires a mutex (and 
doesn't release it), returning something protected by the mutex? And maybe one 
that releases but doesn't acquire.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153131/new/

https://reviews.llvm.org/D153131

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153131: [clang analysis][thread-safety] Handle return-by-reference...

Reply via email to