From: Antoniu Pop <antoniu....@mines-paristech.fr> Date: Tue, 5 Apr 2011 15:45:26 +0200
> On Tue, Apr 5, 2011 at 3:01 PM, Sho Nakatani <dev.laysak...@gmail.com> wrote: >> From: Jakub Jelinek <ja...@redhat.com> >> Date: Tue, 5 Apr 2011 08:33:23 +0200 >> >>> On Tue, Apr 05, 2011 at 11:05:01AM +0900, Sho Nakatani wrote: >>>> - When task A encounters "#pragma omp task" derective, worker creates a >>>> task >>>> and immediately execute it. Worker pushes A to the head of deque. >>> >>> Immediately starting a freshly created task on #pragma omp task is nice for >>> cache >>> locality, ... >> >> Yes. Also, Lazy Task Creation has another good aspect. >> Look at the pictures below. Both shows the call-tree of fibonacci(8). >> One is compiled by gcc and the other is by icc. >> >> (GCC) >> https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-gcc.png >> (ICC) >> https://github.com/laysakura/GCC-OpenMP-Speedup/raw/e5671e7f4175c3ac17c1543c93edf25dda2ae6ac/test/calltree/openmp-fibonacci-calltree-icc.png >> >> Each node with the same color represents the task which is processed on the >> same >> thread (or core). >> These pictures show that the program compiled by icc, which might use Lazy >> Task Creation, >> executes nodes in subtree on the same core, while that compiled by gcc >> doesn't. >> Of course, fib(8) is doesn't clearly show if Lazy Task Creation really has >> good effect >> for the programs that need sophisticated parallelization. But other >> researches like >> https://iwomp.zih.tu-dresden.de/downloads/runtime-olivier.pdf >> also point out it. >> >> My teacher and senior associates know good about Lazy Task Creation and I >> also learned it >> by some Japanese paper. >> My teacher recommended me a paper related to Lazy Task Creation (in English). >> Please google the query and take a look at it if you have a time. >> "Lazy Task Creation: A Technique for Increasing the Granularity of >> Parallel Programs" >> >> >>> Immediately starting a freshly created task on #pragma omp task is nice for >>> cache >>> locality, except it doesn't work at all for tied tasks which you can't move >>> to other >>> threads. For tied tasks (and GCC currently has all tasks tied) it >>> serializes >>> everything. >> >> Then, how about implementing `untied task' as a GSoC project? >> If it goes successful, I'll tackle better `tied task' also. >> What do you think about this idea? >> I'd like to write my proposal tomorrow :-) > > I think this could indeed be quite interesting, and a much needed > improvement to GCC's OpenMP runtime. The overall idea behind Lazy Task > Creation looks good. > However, I would recommend starting with bibliographical work. I'm > certain this solution could be an improvement over the current state, > but it would be nice to know how it compares to other techniques, and > also if there are slowdowns in less dynamic and unbalanced cases. > The work I'm aware of in this area is "Evaluation of OpenMP task > scheduling strategies" by Duran et al > (http://www.sarc-ip.org/files/null/Workshop/1234128788173__TSchedStrat-iwomp08.pdf). > It could be a reasonable starting point. > Thank you so much for the realistic idea. Making the libgomp implementation better is not an easy thing, so I'll learn a lot about other implementations. Learning things around parallel programming deeply surely helps my graduation thesis :-) > You may also want to make sure the scheduling policy it enforces > cannot interfere with OpenMP semantics. For example, how would the > lazy task creation behave if a worker steals a bottom-of-the-stack > task that has a taskwait and the tasks bound to this synchronization > are left in the other worker's dequeue? There can be deadlock issues > or just plain performance issues that can arise. I can't see any > obvious issues, but it's worth thinking about. > > I believe the GSoC idea for working on the libGOMP scheduler is great! > > Antoniu -- Sho Nakatani