GSoC openMP task scheduling Advice
Hi, I'm MSc High-Performance Computing student at Polytechnic University of Catalonia(BarcelonaTech). I'm interesting openmp task scheduling optimization or openmp 3.1 facility taskyield. @For Task scheduling I'm using mercurium compiler already at my university because the compiler was developed by my university. In fact i'm working on openACC integration at mercurium. However i haven't enough knowledge nanos scheduler. I read 3 articles that you recommend related task scheduling. Cilk and work-first scheduling algorithms makes sense for me with cut-off mechanism. I suppose cut-off mechanism will decide on run-time. Now I'm trying to understand which task scheduler is better in some cases at nanos scheduler (https://pm.bsc.es/projects/nanox/wiki/UserManual/Schedule). I'll analyze result with PARAVER profiling tool. However i am not sure for this topics. Should i develop a new task-scheduler? otherwise should I optimize existing scheduler? King regards, Güray Özen Polytechnic University of Catalonia
Re: GSoC openMP task scheduling Advice
Dear All, I'm trying to speed up the OpenMP implementation in GCC and Intel compiler. I'm working on multisort because i thought it has too much branches and i applied tree model task. I used Paraver and Extrae to profile program After that, I executed them on a machine Minotauro supercomputer in top500 with Intel Xeon E5649 E5649 (6-Core, each core has 2 threads) a 2.53 GHz. The following report shows the OpenMP in GCC scheduling vs Intel C Compiler https://github.com/grypp/gcc-gsoc-taskscheduler/raw/master/report.pdf And Here is my omp code https://github.com/grypp/gcc-gsoc-taskscheduler/blob/master/multisort-omp.c I thought gcc tasks/threads waiting too much on the idle than intel compiler's threads. In addition to I tried to profiling nanos task scheduler with mercurium compiler. I used 3 scheduler types in nanos. These're respectively -Work-First (WF) -Breadth-First (BF) -Cilk (CILK) I run this program on computer has 12 processors. The following image shows the trace of schedulers: https://raw.github.com/grypp/gcc-gsoc-taskscheduler/master/nns.png And Here is my ompss code https://github.com/grypp/gcc-gsoc-taskscheduler/blob/master/multisort-ompss-tree.c In addition to i got speed up interestingly breadth-first more than others. Texecution times: -Work-First (WF): ~0.5 -Breadth-First (BF): ~0.22 -Cilk (CILK): ~0.49 I also read this mail "http://gcc.gnu.org/ml/gcc/2011-04/msg00040.html";. I suppose lazy task creatation algorithm was implemented on gcc. I do not know. I wonder? Lazy task creation algorithm may good. But task scheduling is not good. Currently, I'm planning to change task scheduling algorithm with BF. But i am not sure it's good or not. In addition to i'll ask some idea related task scheduling my professor who is the main writer papers that gave you and is director of the barcelona supercomputer center. Are there any advice your? Regards, Güray Özen Polytechnic University of Catalonia 2013/4/26 guray.ozen : > Hi, > > I'm MSc High-Performance Computing student at Polytechnic University > of Catalonia(BarcelonaTech). I'm interesting openmp task scheduling > optimization or openmp 3.1 facility taskyield. > > @For Task scheduling > I'm using mercurium compiler already at my university because the > compiler was developed by my university. In fact i'm working on > openACC integration at mercurium. However i haven't enough knowledge > nanos scheduler. > > I read 3 articles that you recommend related task scheduling. Cilk and > work-first scheduling algorithms makes sense for me with cut-off > mechanism. I suppose cut-off mechanism will decide on run-time. > > Now I'm trying to understand which task scheduler is better in some > cases at nanos scheduler > (https://pm.bsc.es/projects/nanox/wiki/UserManual/Schedule). I'll > analyze result with PARAVER profiling tool. > > However i am not sure for this topics. Should i develop a new > task-scheduler? otherwise should I optimize existing scheduler? > > > King regards, > > > Güray Özen > Polytechnic University of Catalonia
Re: GSoC openMP task scheduling Advice
Dear All, Thank you for your reply Tobias. By the way Mr Jakup I hope my approach is make sense for you. I changed GOMP_SPINCOUNT factor and i got speedup more than. I attached my trace that was profiled extrae and paraver. Light blue mean idle, Dark blue mean running, Yellow scheduling, Fork/Join. First trace belongs to intel trace with default configuration. 2nd trace with spincount=10, 3rd spincount=100, 5rd with spincount=infinity. In addition to you can see running time as nanosecond bottom right corner. https://raw.github.com/grypp/gcc-gsoc-taskscheduler/master/openmp.png In my opinion gcc is little bit slower tha intel because of task scheduling. Some threads (for example thread number 4,9,10 in the multisort-omp-12-spin10 trace or 7,9,11 threads in the multisort-omp-12-infinity trace) waiting too much to other threads. You can see my trace image, My advice i want to change or add task scheduler algorithm. The first thing I really want to add work-stealing mechanism. In this way i can decrease task waiting time and i can provide load-balancing. Also task stealing has a strategy that Untied task can steal parents task. For example parent task-stealing is very good for parallel-multisort algortihm or recursive tasks. If parents task cannot be stolen then the default task stealing techniques is used. In addition to i can change task scheduler as follows : when a task is created, the creating task is suspended and the executing thread switches to the newly created task. When a task is suspended the task is placed in a per thread local pool. So that i can provide better data locality. I'm working on this topics but i am not sure my idea is good or not. Probably i need change my way. Please can you help me? I would like to work with OpenMP in gcc. Regards, Güray Özen Polytechnic University of Catalonia 2013/4/30 Tobias Burnus : > guray.ozen wrote: >> >> I thought gcc tasks/threads waiting too much on the idle than intel >> compiler's threads. > > > Regarding busy waits, you could try to tune the values of the GOMP_SPINCOUNT > environment variable. Search for "@node GOMP_SPINCOUNT" in > http://gcc.gnu.org/viewcvs/gcc/branches/gomp-4_0-branch/libgomp/libgomp.texi?view=co&content-type=text%2Fplain > for details. > > If you have enough cores which are available, there shouldn't be a problem > with idle. (Except with tasks where one could argue that the threads should > do task stealing instead.) > > Tobias, > who leaves the other questions to Jakub
Re: GSoC openMP task scheduling Advice
Hi All, I applied gsoc for openMP taks scheduling and my advice may cover taskyield facility. Currently i have some idea for taskyield. i think i can add something. Therefore i wonder GCC mentor related about openMP was announced? or should i wait until "student acceptance"? Regards, Güray Özen Polytechnic University of Catalonia 2013/5/5 guray.ozen : > Dear Tobias, > > Thank for your interest. > > Yes actually my Master program managed by BSC. So generally my professors > work from BSC or Intel. I know BSC has a nanos group and the group works > generally task scheduling. But i don't know any people from this group. If i > get openmp task project, I'll meet them and i can request help. maybe they > have good trick to implement my advice. Or they may improve my advice. > > Finally I applied gsoc. I hope I'll be accepted :) my advice can cover > openmp3.1 taskyield facility beside task scheduling optimization. Of course > task scheduling contains task stealing. Is like to much i can work on gcc > for openmp. > > Regards, > > On May 3, 2013 1:08 AM, "Tobias Burnus" wrote: >> >> Dear Güray, >> >> seemingly, Jakub is currently rather busy - let me try to answer the >> questions, given that the GSoC dead line is tomorrow. >> >>> I'm MSc High-Performance Computing student at Polytechnic University >>> of Catalonia(BarcelonaTech). >> >> >> For curiosity, do you have something to do with the BSC? Or some other >> on-site person, who is knowledgeable in OpenMP scheduling. >> >> >> (I know that it is a separate entity, but it could be that you have still >> some relation with them - or one of your professors.) >> >>> However i am not sure for this topics. Should i develop a new >>> task-scheduler? otherwise should I optimize existing scheduler? >> >> >> I think it would be useful to try a different talk scheduler, especially >> with regards to task stealing. See http://www.cs.unc.edu/~olivier/ross11.pdf >> for a comparison. >> >> Something similar like the task scheduling of OpenUH's DEQUEUE PER_THREAD1 >> with a doubly linked chain would be useful to try. (That's alledgedly >> similar to the extension to the Qthreads multithreading library using the >> ROSE compiler, used in Olivier's paper.) >> >> I think having some implementation of that scheduling and comparing it >> with the current one would be useful - then one can decide whether the new >> one is better or the current one or whether both should be available for the >> user to switch between the two. >> >> In any case, it is useful to study the literature about the details - but >> then get quickly starting with the implementation to make it possible to >> really play with the scheduler in GCC. >> >> >> Am 01.05.2013 18:50, schrieb guray.ozen: >>> >>> My advice i want to change or add task scheduler algorithm. The first >>> thing I really want to add work-stealing mechanism. In this way i can >>> decrease task waiting time and i can provide load-balancing. >> >> >> Yes, that's in line with the idea above. I think one has to look at the >> literature at the algorithm - and then try to get it implemented. >> >> Regarding the GSoC application, it would be good to give some success >> criteria and time line - that helps later when implementing and also with >> the application. >> >> For some ideas how to write the application, see: >> http://drupal.org/node/59037, http://drupal.org/files/application.pdf, >> http://of-code.blogspot.de/2007/08/soc-experience-introduction.html >> >> It is not required to be extremely long, but it should cover what exactly >> you intent to do and how long the single steps will (roughly) take. >> >> Tobias