Implementing an algorithm in place of gomp 'auto'

2019-03-01 Thread 김규래
Hello everyone, 
I'm an EE student at Sogang University Korea.
I have recently submitted a paper on parallel loop scheduling algorithm and had 
to modify libgomp a bit in the process.
It is known that the...
 
 

/* For now map to schedule(static), later on we could play with feedback driven 
choice. */


... comment has been in place for quite a while.
I'm curious if experimentally implementing a non-OMP-standard algorithm for the 
auto policy would be interesting.
Specifically, one of the Tapering Self-Scheduling, Bold Self-Scheduling, 
Factoring Self-Scheduling [1] algorithms.
If that is the case, I would like to propose this project for GSoC 2019.
 
Great regards from Korea,
Ray Kim
[1] "OpenMP Loop Scheduling Revisited: Making a Case for More Schedules", IWOMP 
2018, https://arxiv.org/abs/1809.03188


Re: Implementing an algorithm in place of gomp 'auto'

2019-03-01 Thread Jakub Jelinek
On Fri, Mar 01, 2019 at 11:40:50PM +0900, 김규래 wrote:
> Hello everyone, 
> I'm an EE student at Sogang University Korea.
> I have recently submitted a paper on parallel loop scheduling algorithm and 
> had to modify libgomp a bit in the process.
> It is known that the...
>  
>  
> 
> /* For now map to schedule(static), later on we could play with feedback 
> driven choice. */
> 
> 
> ... comment has been in place for quite a while.
> I'm curious if experimentally implementing a non-OMP-standard algorithm for 
> the auto policy would be interesting.
> Specifically, one of the Tapering Self-Scheduling, Bold Self-Scheduling, 
> Factoring Self-Scheduling [1] algorithms.
> If that is the case, I would like to propose this project for GSoC 2019.

For auto yes, if it makes good results, though it would be nice to use some
smarts on the compiler side to decide if the loop is a good candidate for
such scheduling, static scheduling has the major advantage that it doesn't
need to enter the runtime library (except for querying number of threads and
thread number, but those can be cached from earlier and barrier at the end
if not nowait) and for many common loops where the work in each iteration is
really constant it also gives best results.

Especially for the case when schedule clause is not present at all, we'd
need to do a very good job at the compiler side to use some dynamic
algorithm when it is likely it will be beneficial.  Unfortunately at least
right now the lowering to either static scheduling or a library call is done
quite early, before inlining etc., and for best decisions we'd need to
either defer it to far later, or be able to convert a library call back to
static scheduling.

Another option is to add further schedules as extensions (say starting with
gnu_ prefix or similar).

Jakub


Re: Implementing an algorithm in place of gomp 'auto'

2019-03-01 Thread 김규래
Nice to meet you Jacob.
 
> Another option is to add further schedules as extensions (say starting with
> gnu_ prefix or similar). 
 
In this case, I believe that modifying the frontend would be necessary?
Last time I looked, it seemed that adding a new scheduling keyword would 
require quite some work.
Also, out of curiosity, is there any plan to add work stealing (affinity 
schedules) to gomp?
The clang implementation seem have work stealing.
 
Ray Kim 
 
-Original Message-
From: "Jakub Jelinek"
To: "김규래";
Cc: "gcc Mailing List";
Sent: 2019-03-02 (토) 02:32:29 (GMT+09:00)
Subject: Re: Implementing an algorithm in place of gomp 'auto'
 
On Fri, Mar 01, 2019 at 11:40:50PM +0900, 김규래 wrote:
> Hello everyone,
> I'm an EE student at Sogang University Korea.
> I have recently submitted a paper on parallel loop scheduling algorithm and 
> had to modify libgomp a bit in the process.
> It is known that the...
>  
>  
>
> /* For now map to schedule(static), later on we could play with feedback 
> driven choice. */
>
>
> ... comment has been in place for quite a while.
> I'm curious if experimentally implementing a non-OMP-standard algorithm for 
> the auto policy would be interesting.
> Specifically, one of the Tapering Self-Scheduling, Bold Self-Scheduling, 
> Factoring Self-Scheduling [1] algorithms.
> If that is the case, I would like to propose this project for GSoC 2019.

For auto yes, if it makes good results, though it would be nice to use some
smarts on the compiler side to decide if the loop is a good candidate for
such scheduling, static scheduling has the major advantage that it doesn't
need to enter the runtime library (except for querying number of threads and
thread number, but those can be cached from earlier and barrier at the end
if not nowait) and for many common loops where the work in each iteration is
really constant it also gives best results.

Especially for the case when schedule clause is not present at all, we'd
need to do a very good job at the compiler side to use some dynamic
algorithm when it is likely it will be beneficial.  Unfortunately at least
right now the lowering to either static scheduling or a library call is done
quite early, before inlining etc., and for best decisions we'd need to
either defer it to far later, or be able to convert a library call back to
static scheduling.

Another option is to add further schedules as extensions (say starting with
gnu_ prefix or similar).

Jakub


Re: Implementing an algorithm in place of gomp 'auto'

2019-03-01 Thread Jakub Jelinek
On Sat, Mar 02, 2019 at 02:40:35AM +0900, 김규래 wrote:
> Nice to meet you Jacob.
>  
> > Another option is to add further schedules as extensions (say starting with
> > gnu_ prefix or similar). 
>  
> In this case, I believe that modifying the frontend would be necessary?

Yes.

> Last time I looked, it seemed that adding a new scheduling keyword would 
> require quite some work.

Not that much.

> Also, out of curiosity, is there any plan to add work stealing (affinity 
> schedules) to gomp?

It is on the wish list, but I'm afraid I won't have cycles for it in the
next year or two at least (once GCC 9 is released, I need to work on the
remaining OpenMP 5.0 features).  Of course if somebody implements it and submits
and it is of reasonable quality/performance, it will be accepted.

> The clang implementation seem have work stealing.

clang doesn't have its own runtime library, you mean the Intel OpenMP
library, right?

Jakub


Re: A bug in vrp_meet?

2019-03-01 Thread Qing Zhao
Jeff,

thanks a lot for the reply.

this is really helpful.

I double checked the dumped intermediate file for pass “dom3", and located the 
following for _152:

BEFORE the pass “dom3”, there is no _152, the corresponding Block looks 
like:

   [local count: 12992277]:
  _98 = (int) ufcMSR_52(D);
  k_105 = (sword) ufcMSR_52(D);
  i_49 = _98 > 0 ? k_105 : 0;

***During the pass “doms”,  _152 is generated as following:

Optimizing block #4
….
Visiting statement:
i_49 = _98 > 0 ? k_105 : 0;
Meeting
  [0, 65535]
and
  [0, 0]
to
  [0, 65535]
Intersecting
  [0, 65535]
and
  [0, 65535]
to
  [0, 65535]
Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
  Replaced 'k_105' with variable '_98'
gimple_simplified to _152 = MAX_EXPR <_98, 0>;
i_49 = _152;
  Folded to: i_49 = _152;
LKUP STMT i_49 = _152
 ASGN i_49 = _152

then bb 4 becomes:

   [local count: 12992277]:
  _98 = (int) ufcMSR_52(D);
  k_105 = _98;
  _152 = MAX_EXPR <_98, 0>;
  i_49 = _152;

and all the i_49 are replaced with _152. 

However, the value range info for _152 doesnot reflect the one for i_49, it 
keeps as UNDEFINED. 

is this the root problem?  

thanks a lot.

Qing

> On Feb 28, 2019, at 1:54 PM, Jeff Law  wrote:
> 
> On 2/28/19 10:05 AM, Qing Zhao wrote:
>> Hi,
>> 
>> I have been debugging a runtime error caused by value range propagation. and 
>> finally located to the following gcc routine:
>> 
>> vrp_meet_1 in gcc/tree-vrp.c
>> 
>> 
>> /* Meet operation for value ranges.  Given two value ranges VR0 and
>>   VR1, store in VR0 a range that contains both VR0 and VR1.  This
>>   may not be the smallest possible such range.  */
>> 
>> static void 
>> vrp_meet_1 (value_range *vr0, const value_range *vr1)
>> {
>>  value_range saved;
>> 
>>  if (vr0->type == VR_UNDEFINED)
>>{
>>  set_value_range (vr0, vr1->type, vr1->min, vr1->max, vr1->equiv);
>>  return;
>>}
>> 
>>  if (vr1->type == VR_UNDEFINED)
>>{
>>  /* VR0 already has the resulting range.  */
>>  return;
>>}
>> 
>> 
>> In the above, when one of vr0 or vr1 is VR_UNDEFINED,  the meet result of 
>> these two will be  the other VALUE. 
>> 
>> This seems not correct to me. 
> That's the optimistic nature of VRP.  It's generally desired behavior.
> 
>> 
>> For example, the following is the located incorrect value range propagation: 
>>  (portion from the dump file *.181t.dom3)
>> 
>> 
>> Visiting PHI node: i_83 = PHI <_152(20), 0(22)>
>>Argument #0 (20 -> 10 executable)
>>_152: UNDEFINED
>>Argument #1 (22 -> 10 executable)
>>0: [0, 0]
>> Meeting
>>  UNDEFINED
>> and
>>  [0, 0]
>> to
>>  [0, 0]
>> Intersecting
>>  [0, 0]
>> and
>>  [0, 65535]
>> to
>>  [0, 0]
>> 
>> 
>> 
>> In the above, “i_83” is defined as PHI <_152(20), 0(22)>,   the 1st argument 
>> is UNDEFINED at this time(but its value range definitely is NOT [0,0]),
>> and the 2nd argument is 0.
> If it's value is undefined then it can be any value we want.  We choose
> to make it equal to the other argument.
> 
> If VRP later finds that _152 changes, then it will go back and
> reevaluate the PHI.  That's one of the basic design principles of the
> optimistic propagators.
> 
>> 
>> “vrp_meet” generate a VR_RANGE with [0,0] for “i_83” based on the current 
>> algorithm.  Obviously, this result VR_RANGE with [0,0] does NOT 
>> contain the value ranges for _152. 
>> 
>> the result of “vrp_meet” is Not correct.  and this incorrect value range 
>> result finally caused the runtime error. 
>> 
>> I ‘d like to modify the vrp_meet_1 as following:
>> 
>> 
>> static void 
>> vrp_meet_1 (value_range *vr0, const value_range *vr1)
>> {
>>  value_range saved;
>> 
>>  if (vr0->type == VR_UNDEFINED)
>>{
>>  /* VR0 already has the resulting range. */
>>  return;
>>}
>> 
>>  if (vr1->type == VR_UNDEFINED)
>>{
>>  set_value_range_to_undefined (vr0)
>> return;
>>}
>> 
>> 
>> let me know your opinion.
>> 
>> thanks a lot for the help.
> I think we (Richi and I) went through this about a year ago and the
> conclusion was we should be looking at how you're getting into the
> vrp_meet with the VR_UNDEFINED.
> 
> If it happens because the user's code has an undefined use, then, the
> consensus has been to go ahead and optimize it.
> 
> If it happens for any other reason, then it's likely a bug in GCC.  We
> had a couple of these when we made EVRP a re-usable module and started
> exploiting its data in the jump threader.
> 
> So you need to work backwards from this point to figure out how you got
> here.
> 
> jeff



Re: A bug in vrp_meet?

2019-03-01 Thread Richard Biener
On March 1, 2019 6:49:20 PM GMT+01:00, Qing Zhao  wrote:
>Jeff,
>
>thanks a lot for the reply.
>
>this is really helpful.
>
>I double checked the dumped intermediate file for pass “dom3", and
>located the following for _152:
>
>BEFORE the pass “dom3”, there is no _152, the corresponding Block
>looks like:
>
>   [local count: 12992277]:
>  _98 = (int) ufcMSR_52(D);
>  k_105 = (sword) ufcMSR_52(D);
>  i_49 = _98 > 0 ? k_105 : 0;
>
>***During the pass “doms”,  _152 is generated as following:
>
>Optimizing block #4
>….
>Visiting statement:
>i_49 = _98 > 0 ? k_105 : 0;
>Meeting
>  [0, 65535]
>and
>  [0, 0]
>to
>  [0, 65535]
>Intersecting
>  [0, 65535]
>and
>  [0, 65535]
>to
>  [0, 65535]
>Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
>  Replaced 'k_105' with variable '_98'
>gimple_simplified to _152 = MAX_EXPR <_98, 0>;
>i_49 = _152;
>  Folded to: i_49 = _152;
>LKUP STMT i_49 = _152
> ASGN i_49 = _152
>
>then bb 4 becomes:
>
>   [local count: 12992277]:
>  _98 = (int) ufcMSR_52(D);
>  k_105 = _98;
>  _152 = MAX_EXPR <_98, 0>;
>  i_49 = _152;
>
>and all the i_49 are replaced with _152. 
>
>However, the value range info for _152 doesnot reflect the one for
>i_49, it keeps as UNDEFINED. 
>
>is this the root problem?  

It looks like DOM fails to visit stmts generated by simplification. Can you 
open a bug report with a testcase?

Richard. 

>thanks a lot.
>
>Qing
>
>> On Feb 28, 2019, at 1:54 PM, Jeff Law  wrote:
>> 
>> On 2/28/19 10:05 AM, Qing Zhao wrote:
>>> Hi,
>>> 
>>> I have been debugging a runtime error caused by value range
>propagation. and finally located to the following gcc routine:
>>> 
>>> vrp_meet_1 in gcc/tree-vrp.c
>>> 
>>> 
>>> /* Meet operation for value ranges.  Given two value ranges VR0 and
>>>   VR1, store in VR0 a range that contains both VR0 and VR1.  This
>>>   may not be the smallest possible such range.  */
>>> 
>>> static void 
>>> vrp_meet_1 (value_range *vr0, const value_range *vr1)
>>> {
>>>  value_range saved;
>>> 
>>>  if (vr0->type == VR_UNDEFINED)
>>>{
>>>  set_value_range (vr0, vr1->type, vr1->min, vr1->max,
>vr1->equiv);
>>>  return;
>>>}
>>> 
>>>  if (vr1->type == VR_UNDEFINED)
>>>{
>>>  /* VR0 already has the resulting range.  */
>>>  return;
>>>}
>>> 
>>> 
>>> In the above, when one of vr0 or vr1 is VR_UNDEFINED,  the meet
>result of these two will be  the other VALUE. 
>>> 
>>> This seems not correct to me. 
>> That's the optimistic nature of VRP.  It's generally desired
>behavior.
>> 
>>> 
>>> For example, the following is the located incorrect value range
>propagation:  (portion from the dump file *.181t.dom3)
>>> 
>>> 
>>> Visiting PHI node: i_83 = PHI <_152(20), 0(22)>
>>>Argument #0 (20 -> 10 executable)
>>>_152: UNDEFINED
>>>Argument #1 (22 -> 10 executable)
>>>0: [0, 0]
>>> Meeting
>>>  UNDEFINED
>>> and
>>>  [0, 0]
>>> to
>>>  [0, 0]
>>> Intersecting
>>>  [0, 0]
>>> and
>>>  [0, 65535]
>>> to
>>>  [0, 0]
>>> 
>>> 
>>> 
>>> In the above, “i_83” is defined as PHI <_152(20), 0(22)>,   the 1st
>argument is UNDEFINED at this time(but its value range definitely is
>NOT [0,0]),
>>> and the 2nd argument is 0.
>> If it's value is undefined then it can be any value we want.  We
>choose
>> to make it equal to the other argument.
>> 
>> If VRP later finds that _152 changes, then it will go back and
>> reevaluate the PHI.  That's one of the basic design principles of the
>> optimistic propagators.
>> 
>>> 
>>> “vrp_meet” generate a VR_RANGE with [0,0] for “i_83” based on the
>current algorithm.  Obviously, this result VR_RANGE with [0,0] does NOT
>
>>> contain the value ranges for _152. 
>>> 
>>> the result of “vrp_meet” is Not correct.  and this incorrect value
>range result finally caused the runtime error. 
>>> 
>>> I ‘d like to modify the vrp_meet_1 as following:
>>> 
>>> 
>>> static void 
>>> vrp_meet_1 (value_range *vr0, const value_range *vr1)
>>> {
>>>  value_range saved;
>>> 
>>>  if (vr0->type == VR_UNDEFINED)
>>>{
>>>  /* VR0 already has the resulting range. */
>>>  return;
>>>}
>>> 
>>>  if (vr1->type == VR_UNDEFINED)
>>>{
>>>  set_value_range_to_undefined (vr0)
>>> return;
>>>}
>>> 
>>> 
>>> let me know your opinion.
>>> 
>>> thanks a lot for the help.
>> I think we (Richi and I) went through this about a year ago and the
>> conclusion was we should be looking at how you're getting into the
>> vrp_meet with the VR_UNDEFINED.
>> 
>> If it happens because the user's code has an undefined use, then, the
>> consensus has been to go ahead and optimize it.
>> 
>> If it happens for any other reason, then it's likely a bug in GCC. 
>We
>> had a couple of these when we made EVRP a re-usable module and
>started
>> exploiting its data in the jump threader.
>> 
>> So you need to work backwards from this point to figure out how you
>got
>> here.
>> 
>> jeff



GSoC

2019-03-01 Thread Ahmed Ashraf
Hello,
I am Ahmed Ashraf, a first-year student at the Computer Engineering
Department, Faculty of Engineering - Cairo University in Egypt.

In the first semester, I was asked to create a CAD as the Circuits course
project. The requirements were to simulate simple AC circuits that contain
only (independent voltage sources, independent current sources, dependent
voltage source, dependent current source, resistors, capacitors, and
inductors).

The main problem which faced me was the math behind the code. I needed to
solve a system of equations in more than 3 variables and the equations
contain complex numbers and the results also were complex but the "math.h"
and "complex.h" were very poor and useless to my project.

So, it's a great chance for me to participate in a project will improve
that libraries to help other people who need it to have a better experiment
in using them.

I am very excited about participating with you in the project "  Add new
math.h and complex.h functions as built-ins". I studied 2 courses in
programming the first was about the basics of C++ and the second was about
Object-Oriented Programming using C++. Currently, I'm studying the third
course about "Data Structures and Algorithms" also using C++ but recently I
started to learn Python from an MIT course and "Data Structures and
Algorithms" specialization in Coursera. Participating in that will be a
very good step in my career closer to my goal.

I also studied some courses in mathematics such that three courses in
Calculus, in different areas like "Differential Calculus", "Integral
Calculus", "Differential Equations" and " Partial Derivatives and Laplace".
Currently, I am studying a course on "Discrete Mathematics" and " Discrete
Mathematics" specialization in Coursera.

I am also training on problem-solving skills constantly from online judges
such that Codeforces.

So, is this enough to have a chance in your project? and how I can improve
my self to be qualified to participate with you before the application
deadline?

GitHub Account 


Re: A bug in vrp_meet?

2019-03-01 Thread Qing Zhao


> On Mar 1, 2019, at 1:25 PM, Richard Biener  wrote:
> 
> On March 1, 2019 6:49:20 PM GMT+01:00, Qing Zhao  > wrote:
>> Jeff,
>> 
>> thanks a lot for the reply.
>> 
>> this is really helpful.
>> 
>> I double checked the dumped intermediate file for pass “dom3", and
>> located the following for _152:
>> 
>> BEFORE the pass “dom3”, there is no _152, the corresponding Block
>> looks like:
>> 
>>  [local count: 12992277]:
>> _98 = (int) ufcMSR_52(D);
>> k_105 = (sword) ufcMSR_52(D);
>> i_49 = _98 > 0 ? k_105 : 0;
>> 
>> ***During the pass “doms”,  _152 is generated as following:
>> 
>> Optimizing block #4
>> ….
>> Visiting statement:
>> i_49 = _98 > 0 ? k_105 : 0;
>> Meeting
>> [0, 65535]
>> and
>> [0, 0]
>> to
>> [0, 65535]
>> Intersecting
>> [0, 65535]
>> and
>> [0, 65535]
>> to
>> [0, 65535]
>> Optimizing statement i_49 = _98 > 0 ? k_105 : 0;
>> Replaced 'k_105' with variable '_98'
>> gimple_simplified to _152 = MAX_EXPR <_98, 0>;
>> i_49 = _152;
>> Folded to: i_49 = _152;
>> LKUP STMT i_49 = _152
>>  ASGN i_49 = _152
>> 
>> then bb 4 becomes:
>> 
>>  [local count: 12992277]:
>> _98 = (int) ufcMSR_52(D);
>> k_105 = _98;
>> _152 = MAX_EXPR <_98, 0>;
>> i_49 = _152;
>> 
>> and all the i_49 are replaced with _152. 
>> 
>> However, the value range info for _152 doesnot reflect the one for
>> i_49, it keeps as UNDEFINED. 
>> 
>> is this the root problem?  
> 
> It looks like DOM fails to visit stmts generated by simplification. Can you 
> open a bug report with a testcase?

The problem is, It took me quite some time in order to come up with a small and 
independent testcase for this problem,
a little bit change made the error disappear.  

do you have any suggestion on this?  or can you give me some hint on how to fix 
this in DOM?  then I can try the fix on my side?

Thanks a lot.

Qing


> 
> Richard. 
> 



Re: GSoC

2019-03-01 Thread Dmitry Mikushin
Linear equation solvers is not the scope of . There are many
packages serving this particular purpose, try looking into e.g. LAPACK.

Kind regards,
- Dmitry.


пт, 1 мар. 2019 г. в 23:09, Ahmed Ashraf :

> Hello,
> I am Ahmed Ashraf, a first-year student at the Computer Engineering
> Department, Faculty of Engineering - Cairo University in Egypt.
>
> In the first semester, I was asked to create a CAD as the Circuits course
> project. The requirements were to simulate simple AC circuits that contain
> only (independent voltage sources, independent current sources, dependent
> voltage source, dependent current source, resistors, capacitors, and
> inductors).
>
> The main problem which faced me was the math behind the code. I needed to
> solve a system of equations in more than 3 variables and the equations
> contain complex numbers and the results also were complex but the "math.h"
> and "complex.h" were very poor and useless to my project.
>
> So, it's a great chance for me to participate in a project will improve
> that libraries to help other people who need it to have a better experiment
> in using them.
>
> I am very excited about participating with you in the project "  Add new
> math.h and complex.h functions as built-ins". I studied 2 courses in
> programming the first was about the basics of C++ and the second was about
> Object-Oriented Programming using C++. Currently, I'm studying the third
> course about "Data Structures and Algorithms" also using C++ but recently I
> started to learn Python from an MIT course and "Data Structures and
> Algorithms" specialization in Coursera. Participating in that will be a
> very good step in my career closer to my goal.
>
> I also studied some courses in mathematics such that three courses in
> Calculus, in different areas like "Differential Calculus", "Integral
> Calculus", "Differential Equations" and " Partial Derivatives and Laplace".
> Currently, I am studying a course on "Discrete Mathematics" and " Discrete
> Mathematics" specialization in Coursera.
>
> I am also training on problem-solving skills constantly from online judges
> such that Codeforces.
>
> So, is this enough to have a chance in your project? and how I can improve
> my self to be qualified to participate with you before the application
> deadline?
>
> GitHub Account 
>


gcc-8-20190301 is now available

2019-03-01 Thread gccadmin
Snapshot gcc-8-20190301 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/8-20190301/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-8-branch 
revision 269333

You'll find:

 gcc-8-20190301.tar.xzComplete GCC

  SHA256=b15b6f48d0abe55cd1123ed5aaeb90ddc97bc6f3443634924c1a16b289b0e2b0
  SHA1=28c41616c92976d08af020a6ca61a1069caecff1

Diffs from 8-20190222 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.