GSoC openMP task scheduling Advice

2013-04-26 Thread guray.ozen
Hi,

I'm MSc High-Performance Computing student at Polytechnic University
of Catalonia(BarcelonaTech). I'm interesting openmp task scheduling
optimization or openmp 3.1 facility taskyield.

@For Task scheduling
I'm using mercurium compiler already at my university because the
compiler was developed by my university. In fact i'm working on
openACC integration at mercurium. However i haven't enough knowledge
nanos scheduler.

I read 3 articles that you recommend related task scheduling. Cilk and
work-first scheduling algorithms makes sense for me with cut-off
mechanism. I suppose cut-off mechanism will decide on run-time.

Now I'm trying to understand which task scheduler is better in some
cases at nanos scheduler
(https://pm.bsc.es/projects/nanox/wiki/UserManual/Schedule). I'll
analyze result with PARAVER profiling tool.

However i am not sure for this topics. Should i develop a new
task-scheduler? otherwise should I optimize existing scheduler?


King regards,


Güray Özen
Polytechnic University of Catalonia


Re: GSoC openMP task scheduling Advice

2013-04-29 Thread guray.ozen
Dear All,

I'm trying to speed up the OpenMP implementation in GCC and Intel
compiler. I'm working on multisort because i thought it has too much
branches and i applied tree model task. I used Paraver and Extrae to
profile program After that, I executed them on a machine Minotauro
supercomputer in top500 with Intel Xeon E5649 E5649 (6-Core, each core
has 2 threads) a 2.53 GHz.

The following report shows the OpenMP in GCC scheduling vs Intel C Compiler
https://github.com/grypp/gcc-gsoc-taskscheduler/raw/master/report.pdf

And Here is my omp code
https://github.com/grypp/gcc-gsoc-taskscheduler/blob/master/multisort-omp.c

I thought gcc tasks/threads waiting too much on the idle than intel
compiler's threads.


In addition to I tried to profiling nanos task scheduler with
mercurium compiler. I used 3 scheduler types in nanos.  These're
respectively
-Work-First (WF)
-Breadth-First (BF)
-Cilk (CILK)

I run this program on computer has 12 processors. The following image
shows the trace of schedulers:
https://raw.github.com/grypp/gcc-gsoc-taskscheduler/master/nns.png

And Here is my ompss code
https://github.com/grypp/gcc-gsoc-taskscheduler/blob/master/multisort-ompss-tree.c

In addition to i got speed up interestingly breadth-first more than others.
Texecution times:
-Work-First (WF):  ~0.5
-Breadth-First (BF): ~0.22
-Cilk (CILK): ~0.49

I also read this mail
"http://gcc.gnu.org/ml/gcc/2011-04/msg00040.html";. I suppose lazy task
creatation algorithm was implemented on gcc. I do not know. I wonder?


Lazy task creation algorithm may good. But task scheduling is not
good. Currently, I'm planning to change task scheduling algorithm with
BF.
But i am not sure it's good or not. In addition to i'll ask some idea
related task scheduling my professor who is the main writer papers
that gave you and is director of the barcelona supercomputer center.

Are there any advice your?

Regards,
Güray Özen
Polytechnic University of Catalonia



2013/4/26 guray.ozen :
> Hi,
>
> I'm MSc High-Performance Computing student at Polytechnic University
> of Catalonia(BarcelonaTech). I'm interesting openmp task scheduling
> optimization or openmp 3.1 facility taskyield.
>
> @For Task scheduling
> I'm using mercurium compiler already at my university because the
> compiler was developed by my university. In fact i'm working on
> openACC integration at mercurium. However i haven't enough knowledge
> nanos scheduler.
>
> I read 3 articles that you recommend related task scheduling. Cilk and
> work-first scheduling algorithms makes sense for me with cut-off
> mechanism. I suppose cut-off mechanism will decide on run-time.
>
> Now I'm trying to understand which task scheduler is better in some
> cases at nanos scheduler
> (https://pm.bsc.es/projects/nanox/wiki/UserManual/Schedule). I'll
> analyze result with PARAVER profiling tool.
>
> However i am not sure for this topics. Should i develop a new
> task-scheduler? otherwise should I optimize existing scheduler?
>
>
> King regards,
>
>
> Güray Özen
> Polytechnic University of Catalonia


Re: GSoC openMP task scheduling Advice

2013-05-01 Thread guray.ozen
Dear All,

Thank you for your reply Tobias.

By the way Mr Jakup I hope my approach is make sense for you.

I changed GOMP_SPINCOUNT factor and i got speedup more than.
I attached my trace that was profiled extrae and paraver. Light blue
mean idle, Dark blue mean running, Yellow scheduling, Fork/Join. First
trace belongs to intel trace with default configuration. 2nd trace
with spincount=10, 3rd spincount=100, 5rd with spincount=infinity. In
addition to you can see running time as nanosecond bottom right
corner.
https://raw.github.com/grypp/gcc-gsoc-taskscheduler/master/openmp.png

In my opinion gcc is little bit slower tha intel because of task
scheduling. Some threads (for example thread number 4,9,10 in the
multisort-omp-12-spin10 trace or 7,9,11 threads in the
multisort-omp-12-infinity trace) waiting too much to other threads.
You can see my trace image,

My advice i want to change or add task scheduler algorithm. The first
thing I really want to add work-stealing mechanism. In this way i can
decrease task waiting time and i can provide load-balancing. Also task
stealing has a strategy that Untied task can steal parents task. For
example parent task-stealing is very good for parallel-multisort
algortihm or recursive tasks. If parents task cannot be stolen then
the default task stealing techniques is used. In addition to i can
change task scheduler as follows : when a task is created, the
creating task is suspended and the executing thread switches to the
newly created task. When a task is suspended the task is placed in a
per thread local pool. So that i can provide better data locality.

I'm working on this topics but i am not sure my idea is good or not.
Probably i need change my way. Please can you help me? I would like to
work with OpenMP in gcc.

Regards,

Güray Özen
Polytechnic University of Catalonia



2013/4/30 Tobias Burnus :
> guray.ozen wrote:
>>
>> I thought gcc tasks/threads waiting too much on the idle than intel
>> compiler's threads.
>
>
> Regarding busy waits, you could try to tune the values of the GOMP_SPINCOUNT
> environment variable. Search for "@node GOMP_SPINCOUNT" in
> http://gcc.gnu.org/viewcvs/gcc/branches/gomp-4_0-branch/libgomp/libgomp.texi?view=co&content-type=text%2Fplain
> for details.
>
> If you have enough cores which are available, there shouldn't be a problem
> with idle. (Except with tasks where one could argue that the threads should
> do task stealing instead.)
>
> Tobias,
> who leaves the other questions to Jakub


Re: GSoC openMP task scheduling Advice

2013-05-14 Thread guray.ozen
Hi All,

I applied gsoc for openMP taks scheduling and my advice may cover
taskyield facility. Currently i have some idea for taskyield. i think
i can add something. Therefore i wonder GCC mentor related about
openMP was announced? or should i wait until "student acceptance"?

Regards,
Güray Özen
Polytechnic University of Catalonia



2013/5/5 guray.ozen :
> Dear Tobias,
>
> Thank for your interest.
>
> Yes actually my Master program managed by BSC. So generally my professors
> work from BSC or Intel. I know BSC has a nanos group and the group works
> generally task scheduling. But i don't know any people from this group. If i
> get openmp task project, I'll meet them and i can request help. maybe they
> have good trick to implement my advice. Or they may improve my advice.
>
> Finally I applied gsoc. I hope I'll be accepted :) my advice can cover
> openmp3.1 taskyield facility beside task scheduling optimization. Of course
> task scheduling contains task stealing. Is like to much i can work on gcc
> for openmp.
>
> Regards,
>
> On May 3, 2013 1:08 AM, "Tobias Burnus"  wrote:
>>
>> Dear Güray,
>>
>> seemingly, Jakub is currently rather busy - let me try to answer the
>> questions, given that the GSoC dead line is tomorrow.
>>
>>> I'm MSc High-Performance Computing student at Polytechnic University
>>> of Catalonia(BarcelonaTech).
>>
>>
>> For curiosity, do you have something to do with the BSC? Or some other
>> on-site person, who is knowledgeable in OpenMP scheduling.
>>
>>
>> (I know that it is a separate entity, but it could be that you have still
>> some relation with them - or one of your professors.)
>>
>>> However i am not sure for this topics. Should i develop a new
>>> task-scheduler? otherwise should I optimize existing scheduler?
>>
>>
>> I think it would be useful to try a different talk scheduler, especially
>> with regards to task stealing. See http://www.cs.unc.edu/~olivier/ross11.pdf
>> for a comparison.
>>
>> Something similar like the task scheduling of OpenUH's DEQUEUE PER_THREAD1
>> with a doubly linked chain would be useful to try. (That's alledgedly
>> similar to the extension to the Qthreads multithreading library using the
>> ROSE compiler, used in Olivier's paper.)
>>
>> I think having some implementation of that scheduling and comparing it
>> with the current one would be useful - then one can decide whether the new
>> one is better or the current one or whether both should be available for the
>> user to switch between the two.
>>
>> In any case, it is useful to study the literature about the details - but
>> then get quickly starting with the implementation to make it possible to
>> really play with the scheduler in GCC.
>>
>>
>> Am 01.05.2013 18:50, schrieb guray.ozen:
>>>
>>> My advice i want to change or add task scheduler algorithm. The first
>>> thing I really want to add work-stealing mechanism. In this way i can
>>> decrease task waiting time and i can provide load-balancing.
>>
>>
>> Yes, that's in line with the idea above. I think one has to look at the
>> literature at the algorithm - and then try to get it implemented.
>>
>> Regarding the GSoC application, it would be good to give some success
>> criteria and time line - that helps later when implementing and also with
>> the application.
>>
>> For some ideas how to write the application, see:
>> http://drupal.org/node/59037, http://drupal.org/files/application.pdf,
>> http://of-code.blogspot.de/2007/08/soc-experience-introduction.html
>>
>> It is not required to be extremely long, but it should cover what exactly
>> you intent to do and how long the single steps will (roughly) take.
>>
>> Tobias