On Jan 5, 2015, at 12:49 AM, Bernd Edlinger <[email protected]> wrote:
> On Sun, 4 Jan 2015 14:18:59, Mike Stump wrote:
>>
>>> But tsan still gets at least 90% chance to spot that. As a matter of fact,
>>> the tsan runtime is just _incredibly_ fast,
>>> and catches errors, with a very high reliability. But it is racy by design.
>>
>> You say by design. I’m skeptical of that. Can you quote the email or the
>> code were they said I wanted to design a system that wasn’t 100% reliable,
>> because the best they could do was 2% slower, and they didn’t want it to be
>> 2% slower? I tend to think someone didn’t figure out they could atomically,
>> locklessly do the update, or they didn’t understand the performance hit on
>> the lockless code was small, or worse, they had no clue it was racy. I’d be
>> curious to know. Maybe, there are other complications that prevent it from
>> being non-racy.
>
> see for instance this thread "Is TSan an exact tool?":
> https://groups.google.com/forum/#!topic/thread-sanitizer/mB73m6Nltaw
Yeah, that’s not the issue. The issue is, in this one narrow case, was is
possible to use a lockless method of updating the data so that the tool would
be slightly better. That post doesn’t address this question. For example, if
a patch were submitted to locklessly update so both can notice who is first,
was it rejected and why? Anyway, that’s a side issue, and we can safely ignore
it. I’m fine with making testing reliable with how it works today.
> Nevertheless I would prefer to silence the test now with sleep(1),
> And come back later with a follow-up patch to use a un-instrumented and
> non-interceptable synchronization primitive in all tsan tests. As it is at
> least
> not obvious how to do that in the test suite. But it must be possible.
I thought the code was posted, I thought how to put it in a file was posted. I
think it is a 2 minute endeavor to put all those pieces together and run a test
case? You do know how to run just 1 tsan test case with make
RUNTESTFLAGS=tsan.exp=aligned_vs_unaligned_race.C check-g++ sort of thing?
This can shorten the testing time from hours per iteration to seconds per
iteration. Slightly non-obvious, but very handy when developing some types of
test cases.
So, to help you out, I tried my hand at wiring up the extra library code:
gcc/gcc/testsuite/gcc.dg/tsan$ cat test1.c
/* { dg-additional-options "-O2" } */
/* { dg-additional-sources "lib.c" } */
void sync(int);
int main(int argc, char**argv) {
sync(1);
sync(2);
sync(3);
}
gcc/gcc/testsuite/gcc.dg/tsan$ cat lib.c
/* { dg-do compile } */
#include <stdlib.h>
volatile int serial;
__attribute__((no_sanitize_address))
void sync(int i) {
while (serial != i-1) ;
while (1) {
if (++serial == i)
return;
--serial;
sleep ();
}
}
and it ran just fine. The options are for both files. I read some of the tsan
documentation, and they seem to claim you can have tsan not process a function
by merely asking. I did that. I don’t know if that is enough to hide all the
semantics that need to be hidden.
In the above code, the ++ and -- should be an atomic increment/decrement.
The idea is that can can introduce a deterministic ordering to the execution of
the code by adding sync(n), where n is the step number, whose range starts at 1.
So, for example, here is the test case, with the bits added, so you can see how
I transformed what would have been the sleep into the use of the synchronizing
primitive from the library.
/* { dg-additional-sources "lib.c" } */
/* { dg-shouldfail "tsan" } */
#include <pthread.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
void sync(int);
uint64_t Global[2];
void *Thread1(void *x) {
sync (2);
Global[1]++;
return NULL;
}
void *Thread2(void *x) {
char *p1 = reinterpret_cast<char *>(&Global[0]);
struct __attribute__((packed, aligned(1))) u_uint64_t { uint64_t val; };
u_uint64_t *p4 = reinterpret_cast<u_uint64_t *>(p1 + 1);
(*p4).val++;
sync (1);
return NULL;
}
int main() {
pthread_t t[2];
pthread_create(&t[0], NULL, Thread1, NULL);
pthread_create(&t[1], NULL, Thread2, NULL);
pthread_join(t[0], NULL);
pthread_join(t[1], NULL);
printf("Pass\n");
/* { dg-output "WARNING: ThreadSanitizer: data race.*(\n|\r\n|\r)" } */
/* { dg-output "Pass.*" } */
return 0;
}
The question is, does this 100% reliably solve the problem?