Hi Andreas,

Thank you for your comments.

> I think the biggest problem in libgomp is that tasks are stored in a
> single queue. This may work for a few threads or for long-running tasks,
> but certainly not for 48 threads. In fact, contention on the queue can
> grow so high that performance starts to degrade when you add more
> threads.
>
> First of all, you need to think about data structures for storing tasks.
> When you look at other parallel runtime systems/libraries, you will find
> that many implement some form of work-stealing scheduling. In
> work-stealing scheduling, each worker or group of workers have their own
> queue on which they operate. Load balancing is done by stealing tasks
> from the queues of other workers. The Nanos runtime is probably a good
> place to start to get some ideas...

I totally agree with this point.
Currently, I'm planning to implement tied task using breath-first
scheduler wrote in
section 3.1 of "Evaluation of OpenMP Task Scheduling Strategies" by Nanos Group.
http://www.sarc-ip.org/files/null/Workshop/1234128788173__TSchedStrat-iwomp08.pdf

That is:
* A team has one team queue which contains tasks.
* A team has some (user-level) threads.
* A thread can have one running task.
* A thread has private queue which contains tasks.
* When a task is created, it is queued in team queue.
* Each thread steals tasks from the team queue and inserts it in the
private queue.
* Once tied task is executed in a thread, it is queued only in the
private queue in the thread
  when it encounters `taskwait'.
* Each thread runs a task from its private queue.

But I'm not sure how to achieve good load-balancing and what kind of
cutoff strategy to take.
As for load-balancing, I'll read Nanos4 implementations and ask Nanos
Group for it.
(Of course your advice will do :-) )

As for cutoff, basically I can choose `max-tasks' strategy or
`max-levels' strategy.
When number of tasks or recursion levels exceed this value, the
scheduler stops its work
and execute each task as sentences in sequential programs.
But "Evaluation of OpenMP Task Scheduling Strategies" says better
cutoff strategy is different
from application to application.


> Regarding Lazy Task Creation (LTC)... I agree with the advice here and
> would say this is not what you should focus on. The most prominent use
> of LTC is in the Cilk multithreaded language. Basically, we have two
> options when we encounter a new task: (1) spawn the task as a child and
> continue executing the parent, or, what LTC does, (2) start executing
> the task and leave the continuation (the rest of the parent) for later
> execution. Load balancing requires that the continuation is made
> available so that it can be picked up and executed by other threads. The
> Nanos runtime supports different scheduling strategies, including
> work-first scheduling, which is similar in principle to LTC. Nanos
> implements tasks as user-level threads (nano-threads), so it's easier to
> switch between tasks or move tasks across threads. Something like
> work-first scheduling that depends on untied tasks is nice to have. But
> I think a decent scheduler (for tied tasks) is more important at this
> point...

In GSoC project, I will not implement LTC but just focus on schedulers
which Nanos4 has already has.
I'll first implement `tied task' using breadth-first scheduler, and after that
I'll implement `untied task' using work-first scheduler (if I have time).

Thanks,
--
Sho Nakatani

Reply via email to