I'm not sure this is the problem that you're seeing, but I see a
problem with the example. It boils down to the fact that futures do not
provide concurrency.

That may sound like a surprising claim, because the whole point of
futures is to run multiple things at a time. But futures merely offer
best-effort parallelism; they do not provide any guarantee of
concurrency.

As a consequence, trying to treat an fsemaphore as a lock can go wrong.
If a future manages to take an fsemaphore lock, but the future is not
demanded by the main thread --- or in a chain of future demands that
are demanded by the main thread --- then nothing obliges the future to
continue running; it can hold the lock forever.

(I put the blame on femspahores. Adding fsemaphores to the future
system was something like adding mutation to a purely functional
language. The addition makes certain things possible, but it also
breaks local reasoning that the original design was supposed to
enable.)

In your example program, I see

 (define workers (do-start-workers))
 (displayln "started")
 (for ((i 10000))
   (mfqueue-enqueue! mfq 1))

where `do-start-workers` creates a chain of futures, but there's no
`touch` on the root future while the loop calls `mfqueue-enqueue!`.
Therefore, the loop can block on an fsemaphore because some future has
taken the lock but stopped running for whatever reason.

In this case, adding `(thread (lambda () (touch workers)))` before the
loop after "started" might fix the example. In other words, you can use
the `thread` concurrency construct in combination with the `future`
parallelism construct to ensure progress. I think this will work
because all futures in the program end up in a linear dependency chain.
If there were a tree of dependencies, then I think you'd need a
`thread` for each `future` to make sure that every future has an active
demand.

If you're seeing a deadlock at the `(touch workers)`, though, my
explanation doesn't cover what you're seeing. I haven't managed to
trigger the deadlock myself.

At Sat, 23 May 2020 18:51:23 +0200, Dominik Pantůček wrote:
> Hello again with futures!
> 
> I started working on futures-based workers and got quickly stuck with a
> dead-lock I think does not originate in my code (although it is two
> semaphores, 8 futures, so I'll refrain from strong opinions here).
> 
> I implemented a very simple futures-friendly queue using mutable pairs
> and created a minimal-deadlocking-example[1]. I am running racket 3m
> 7.7.0.4 which includes fixes for the futures-related bugs I discovered
> recently.
> 
> Sometimes the code just runs fine and shows the numbers of worker
> iterations performed in different futures (as traced by the 'fid'
> argument). But sometimes it locks in a state where there is one last
> number in the queue (0 - zero) and yet the fsemaphore-count for the
> count fsemaphore returns 0. Which means the semaphore was decremented
> twice somewhere. The code is really VERY simple and I do not see a
> race-condition within the code, that would allow any code path to
> decrement the fsema-count fsemaphore twice once the worker future
> receives 0.
> 
> I am able to reproduce the behavior with racket3m running under gdb and
> get the stack traces for all the threads pretty consistently. The
> deadlock is apparently at:
> 
>   2    Thread 0x7ffff7fca700 (LWP 46368) "mfqueue.rkt"
> futex_wait_cancelable (private=<optimized out>, expected=0,
> futex_word=0x5555559d8e78) at ../sysdeps/nptl/futex-internal.h:183
> 
> But that is just where the issue is showing up. The real question is how
> the counter gets decremented twice (given that fsemaphores should be
> futures-safe).
> 
> Any hints would be VERY appreciated!
> 
> 
> Cheers,
> Dominik
> 
> [1] http://pasterack.org/pastes/28883
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/racket-users/5dcf1260-e8bf-d719-adab-5a0fd937
> 8075%40trustica.cz.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/20200523112413.15a%40sirmail.smtp.cs.utah.edu.

Reply via email to