From: Antoniu Pop <antoniu....@mines-paristech.fr>
Date: Tue, 5 Apr 2011 15:45:26 +0200

> On Tue, Apr 5, 2011 at 3:01 PM, Sho Nakatani <dev.laysak...@gmail.com> wrote:
>> From: Jakub Jelinek <ja...@redhat.com>
>> Date: Tue, 5 Apr 2011 08:33:23 +0200
>>
>>> On Tue, Apr 05, 2011 at 11:05:01AM +0900, Sho Nakatani wrote:
>>>>   - When task A encounters "#pragma omp task" derective, worker creates a 
>>>> task
>>>>     and immediately execute it. Worker pushes A to the head of deque.
>>>
>>> Immediately starting a freshly created task on #pragma omp task is nice for 
>>> cache
>>> locality, ...
>>
>> Yes. Also, Lazy Task Creation has another good aspect.
>> Look at the pictures below. Both shows the call-tree of fibonacci(8).
>> One is compiled by gcc and the other is by icc.
>>
>> (GCC) 
>> https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-gcc.png
>> (ICC) 
>> https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-icc.png
>>
>> Each node with the same color represents the task which is processed on the 
>> same
>> thread (or core).
>> These pictures show that the program compiled by icc, which might use Lazy 
>> Task Creation,
>> executes nodes in subtree on the same core, while that compiled by gcc 
>> doesn't.
>> Of course, fib(8) is doesn't clearly show if Lazy Task Creation really has 
>> good effect
>> for the programs that need sophisticated parallelization. But other 
>> researches like
>>    https://iwomp.zih.tu-dresden.de/downloads/runtime-olivier.pdf
>> also point out it.
>>
>> My teacher and senior associates know good about Lazy Task Creation and I 
>> also learned it
>> by some Japanese paper.
>> My teacher recommended me a paper related to Lazy Task Creation (in English).
>> Please google the query and take a look at it if you have a time.
>>    "Lazy Task Creation: A Technique for Increasing the Granularity of 
>> Parallel Programs"
>>
>>
>>> Immediately starting a freshly created task on #pragma omp task is nice for 
>>> cache
>>> locality, except it doesn't work at all for tied tasks which you can't move 
>>> to other
>>> threads.  For tied tasks (and GCC currently has all tasks tied) it 
>>> serializes
>>> everything.
>>
>> Then, how about implementing `untied task' as a GSoC project?
>> If it goes successful, I'll tackle better `tied task' also.
>> What do you think about this idea?
>> I'd like to write my proposal tomorrow :-)
> 
> I think this could indeed be quite interesting, and a much needed
> improvement to GCC's OpenMP runtime. The overall idea behind Lazy Task
> Creation looks good.
> However, I would recommend starting with bibliographical work. I'm
> certain this solution could be an improvement over the current state,
> but it would be nice to know how it compares to other techniques, and
> also if there are slowdowns in less dynamic and unbalanced cases.
> The work I'm aware of in this area is "Evaluation of OpenMP task
> scheduling strategies" by Duran et al
> (http://www.sarc-ip.org/files/null/Workshop/1234128788173__TSchedStrat-iwomp08.pdf).
> It could be a reasonable starting point.
> 

Thank you so much for the realistic idea.
Making the libgomp implementation better is not an easy thing,
so I'll learn a lot about other implementations.
Learning things around parallel programming deeply surely helps
my graduation thesis :-)

> You may also want to make sure the scheduling policy it enforces
> cannot interfere with OpenMP semantics. For example, how would the
> lazy task creation behave if a worker steals a bottom-of-the-stack
> task that has a taskwait and the tasks bound to this synchronization
> are left in the other worker's dequeue? There can be deadlock issues
> or just plain performance issues that can arise. I can't see any
> obvious issues, but it's worth thinking about.
> 
> I believe the GSoC idea for working on the libGOMP scheduler is great!
> 
> Antoniu

--
Sho Nakatani

Reply via email to