Hi Mike,
Yes, I can quantify the cost. Is it very high.
Here is the patch that I used:
--- rtl/tsan_rtl.cc (revision 226644)
+++ rtl/tsan_rtl.cc (working copy)
@@ -709,7 +709,11 @@
ALWAYS_INLINE USED
void MemoryAccess(ThreadState *thr, uptr pc, uptr addr,
int kAccessSizeLog, bool kAccessIsWrite, bool kIsAtomic) {
u64 *shadow_mem = (u64*)MemToShadow(addr);
+
+ atomic_fetch_add((atomic_uint64_t*)shadow_mem, 0, memory_order_acq_rel);
+
On the standard tsan benchmark that does 8-byte writes:
before:
[ OK ] DISABLED_BENCH.Mop8Write (1161 ms)
after:
[ OK ] DISABLED_BENCH.Mop8Write (5085 ms)
So that's 338% slowdown.
On Mon, Jan 19, 2015 at 6:12 PM, Mike Stump <[email protected]> wrote:
> On Jan 19, 2015, at 12:47 AM, Dmitry Vyukov <[email protected]> wrote:
>> Long story short. Tsan has a logical data race the core of data race
>> detection algorithm. The race is not a bug, but a deliberate design
>> decision that makes tsan considerably faster.
>
> Could you please quantify that for us? Also, what lockless update method did
> you use? Did you try atomic increment? On my machine, they are as cheap as
> stores; can’t imagine they could be slow at all. If the latency and
> bandwidth of atomic increment is identical to store, would the costs be any
> higher than using a store to update the tsan data? A proper port of tsan to
> my machine would make use of atomic increment. I consider it a simple matter
> to sequence the thread termination and the output routine to ensure that all
> the updates in the threads happen before the output routine runs. The output
> routine strikes me as slow, and thread termination strikes me as slow, so
> taking a little extra time there seems reasonable. Was the excessive cost
> you saw due to the termination costs?
>
>> So ironically, if the race memory accesses happen almost simultaneously,
>> tsan can miss the
>> race.
>> Thus we have sleeps.
>
> I’ve not seen a reason why the test suite should randomly fail. The gcc test
> suite does not. Could you explain why the llvm test suite does? Surely you
> know that sleep is not a synchronization primitive?
>
>> Sleeps vs barrier is being discussed in the "Fix parameters of
>> __tsan_vptr_update" thread.
>
> When finished, let us know the outcome. To date, I’ve not seen any
> compelling reason to have the core of tsan be other than deterministic and
> the test suite other than deterministic. I’d love to see the backing for
> such a decision.
>
>> I would really like to keep llvm and gcc tests in sync as much as possible.
>
> Excellent, from my perspective, that would mean that you make the llvm test
> suite deterministic.