Hi Sho,

I just came across your project and would like to add a few comments.

I think the biggest problem in libgomp is that tasks are stored in a
single queue. This may work for a few threads or for long-running tasks,
but certainly not for 48 threads. In fact, contention on the queue can
grow so high that performance starts to degrade when you add more
threads.

First of all, you need to think about data structures for storing tasks.
When you look at other parallel runtime systems/libraries, you will find
that many implement some form of work-stealing scheduling. In
work-stealing scheduling, each worker or group of workers have their own
queue on which they operate. Load balancing is done by stealing tasks
from the queues of other workers. The Nanos runtime is probably a good
place to start to get some ideas...  

Regarding Lazy Task Creation (LTC)... I agree with the advice here and
would say this is not what you should focus on. The most prominent use
of LTC is in the Cilk multithreaded language. Basically, we have two
options when we encounter a new task: (1) spawn the task as a child and
continue executing the parent, or, what LTC does, (2) start executing
the task and leave the continuation (the rest of the parent) for later
execution. Load balancing requires that the continuation is made
available so that it can be picked up and executed by other threads. The
Nanos runtime supports different scheduling strategies, including
work-first scheduling, which is similar in principle to LTC. Nanos
implements tasks as user-level threads (nano-threads), so it's easier to
switch between tasks or move tasks across threads. Something like
work-first scheduling that depends on untied tasks is nice to have. But
I think a decent scheduler (for tied tasks) is more important at this
point...

-Andreas

Reply via email to