alignof(type, field); sizeof(type, field); typeof(type, field): getting type information on nested field

2019-07-02 Thread Yann Droneaud
Hi,

I'm sometime in need to "probe" the size, the type, (and less often the
alignment) of a field inside a structure.

In such case I have to write "ugly" thing like

  struct A
  {
struct
{
  type_t t;
} B;
  };

  typeof(((struct A *)NULL)->B.t) V;

It would have been some much pleasing to have 2 parameters version of
sizeof(), typeof(), and alignof(), which would look like a lot like
offsetof() usage:

  typeof(struct A, B.t) V;

  if (sizeof(struct A, B.t) != sizeof(long)) ... ;

  if (alignof(struct A, B.t) < alignof(long)) ... ;

As sizeof is an operator, this is not straightforward to implement.

Currently sizeof (0, variable); is a valid construct, giving the size
of  "variable".

Hence, turning sizeof() to a variadic function-like thing would break
existing code.

Maybe it would possible to add such function-like operator as _Sizeof()
? And have a header #define'ing sizeof(...) as _Sizeof() for code
wanting the new behavior ? 

Is this possible ? Is such feature was already considered ?

Regards.

-- 
Yann Droneaud
OPTEYA




Re: alignof(type, field); sizeof(type, field); typeof(type, field): getting type information on nested field

2019-07-02 Thread Jonathan Wakely
On Tue, 2 Jul 2019 at 08:57, Yann Droneaud  wrote:
>
> Hi,
>
> I'm sometime in need to "probe" the size, the type, (and less often the
> alignment) of a field inside a structure.
>
> In such case I have to write "ugly" thing like
>
>   struct A
>   {
> struct
> {
>   type_t t;
> } B;
>   };
>
>   typeof(((struct A *)NULL)->B.t) V;
>
> It would have been some much pleasing to have 2 parameters version of
> sizeof(), typeof(), and alignof(), which would look like a lot like
> offsetof() usage:
>
>   typeof(struct A, B.t) V;
>
>   if (sizeof(struct A, B.t) != sizeof(long)) ... ;
>
>   if (alignof(struct A, B.t) < alignof(long)) ... ;
>
> As sizeof is an operator, this is not straightforward to implement.
>
> Currently sizeof (0, variable); is a valid construct, giving the size
> of  "variable".
>
> Hence, turning sizeof() to a variadic function-like thing would break
> existing code.
>
> Maybe it would possible to add such function-like operator as _Sizeof()
> ? And have a header #define'ing sizeof(...) as _Sizeof() for code
> wanting the new behavior ?

That seems like a terrible idea.

> Is this possible ? Is such feature was already considered ?

#define SIZEOF_MEM(type, member) sizeof(((type*)0).member)
#define TYPEOF_MEM(type, member) __typeof__(((type*)0).member)
#define ALIGNOF_MEM(type, member) _Alignof(((type*)0).member)


Re: RFC on a new optimization

2019-07-02 Thread Richard Biener
On Mon, Jul 1, 2019 at 11:58 PM Gary Oblock  wrote:
>
> I've been looking at trying to optimize the performance of code for
> programs that use functions like qsort where a function is passed the
> name of a function and some constant parameter(s).
>
> The function qsort itself is an excellent example of what I'm trying to show
> what I want to do, except for being in a library, so please ignore
> that while I proceed assuming that that qsort is not in a library.  In
> qsort the user passes in a size of the array elements and comparison
> function name in addition to the location of the array to be sorted. I
> noticed that for a given call site that the first two are always the
> same so why not create a specialized version of qsort that eliminates
> them and internally uses a constant value for the size parameter and
> does a direct call instead of an indirect call. The later lets the
> comparison function code be inlined.
>
> This seems to me to be a very useful optimization where heavy use is
> made of this programming idiom. I saw a 30%+ overall improvement when
> I specialized a function like this by hand in an application.
>
> My question is does anything inside gcc do something similar? I don't
> want to reinvent the wheel and I want to do something that plays
> nicely with the rest of gcc so it makes it into real world. Note, I
> should mention that I'm an experienced compiler developed and I'm
> planning on adding this optimization unless it's obvious from the
> ensuing discussion that either it's a bad idea or that it's a matter
> of simply tweaking gcc a bit to get this optimization to occur.

GCC performs intraprocedural constant propagation (IPA-CP) and
this should catch your case already.  The IPA-CP function cloning
might have too constrained limits (on code bloat) to apply on a
specific testcase but all functionality for the qsort case should
be available.

Richard.

> Thanks,
>
> Gary Oblock


Re: RFC on a new optimization

2019-07-02 Thread Martin Jambor
Hi,

On Tue, Jul 02 2019, Richard Biener wrote:
> On Mon, Jul 1, 2019 at 11:58 PM Gary Oblock  wrote:
>>
>> I've been looking at trying to optimize the performance of code for
>> programs that use functions like qsort where a function is passed the
>> name of a function and some constant parameter(s).
>>
>> The function qsort itself is an excellent example of what I'm trying to show
>> what I want to do, except for being in a library, so please ignore
>> that while I proceed assuming that that qsort is not in a library.  In
>> qsort the user passes in a size of the array elements and comparison
>> function name in addition to the location of the array to be sorted. I
>> noticed that for a given call site that the first two are always the
>> same so why not create a specialized version of qsort that eliminates
>> them and internally uses a constant value for the size parameter and
>> does a direct call instead of an indirect call. The later lets the
>> comparison function code be inlined.
>>
>> This seems to me to be a very useful optimization where heavy use is
>> made of this programming idiom. I saw a 30%+ overall improvement when
>> I specialized a function like this by hand in an application.
>>
>> My question is does anything inside gcc do something similar? I don't
>> want to reinvent the wheel and I want to do something that plays
>> nicely with the rest of gcc so it makes it into real world. Note, I
>> should mention that I'm an experienced compiler developed and I'm
>> planning on adding this optimization unless it's obvious from the
>> ensuing discussion that either it's a bad idea or that it's a matter
>> of simply tweaking gcc a bit to get this optimization to occur.
>
> GCC performs intraprocedural constant propagation (IPA-CP) and
> this should catch your case already.  The IPA-CP function cloning
> might have too constrained limits (on code bloat) to apply on a
> specific testcase but all functionality for the qsort case should
> be available.

At least in 505.mcf/605.mcf we do inline the comparator to the qsort
function - and in order to do that, IPA-CP actually creates two clones,
one for each of the two used comparators in the benchmark, see:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149

I'll be happy to see any examples where it fails to do the right thing.

Martin


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Ramana Radhakrishnan
On Tue, Jul 2, 2019 at 1:29 AM Akshat Garg  wrote:
>
> On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg  wrote:
>>
>> On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan 
>>  wrote:
>>>
>> [CCing gcc mailing list]
>>
>> So, shall I start looking over the pointer optimizations only and see what 
>> information we may be needed on the same examples in the IR itself?
>>
>> - Akshat
>
> I have coded an example where equality comparison kills dependency from the 
> document P0190R4 as shown below :
>
> 1. struct rcutest rt = {1, 2, 3};
> 2. void thread0 ()
> 3. {
> 4. rt.a = -42;
> 5. rt.b = -43;
> 6. rt.c = -44;
> 7. rcu_assign_pointer(gp, &rt);
> 8. }
> 9.
> 10. void thread1 ()
> 11. {
> 12.int i = -1;
> 13.int j = -1;
> 14._Dependent_ptr struct rcutest *p;
> 15.
> 16.p = rcu_dereference(gp);
> 17.j = p->a;
> 18.   if (p == &rt)
> 19.i = p->b;  /*Dependency breaking point*/
> 20.   else if(p)
> 21.   i = p->c;
> 22.   assert(i<0);
> 23.   assert(j<0);
> 24. }
> The gimple unoptimized code produced for lines 17-24 is shown below
>
> 1. if (p_16 == &rt)
> 2. goto ; [INV]
> 3.   else
> 4.goto ; [INV]
> 5.
> 6.   :
> 7.  i_19 = p_16->b;
> 8.  goto ; [INV]
> 9.
> 10.   :
> 11.  if (p_16 != 0B)
> 12.goto ; [INV]
> 13.  else
> 14.goto ; [INV]
> 15.
> 16.   :
> 17.  i_18 = p_16->c;
> 18.
> 19.   :
> 20.  # i_7 = PHI 
> 21.  _3 = i_7 < 0;
> 22.  _4 = (int) _3;
> 23.  assert (_4);
> 24.  _5 = j_17 < 0;
> 25.  _6 = (int) _5;
> 26.  assert (_6);
> 27.  return;
>
> The optimized code after -O1 is applied for the same lines is hown below :
>
> 1. if (_2 == &rt)
> 2.goto ; [30.00%]
> 3. else
> 4.goto ; [70.00%]
> 5.
> 6.   [local count: 322122547]:
> 7.   i_12 = rt.b;
> 8.   goto ; [100.00%]
> 9.
> 10.   [local count: 751619277]:
> 11.   if (_1 != 0)
> 12.   goto ; [50.00%]
> 13.   else
> 14.goto ; [50.00%]
> 15.
> 16.   [local count: 375809638]:
> 17.   i_11 = MEM[(dependent_ptr struct rcutest *)_2].c;
> 18.
> 19.[local count: 1073741824]:
> 20.  # i_7 = PHI 
> 21.   _3 = i_7 < 0;
> 22.   _4 = (int) _3;
> 23.   assert (_4);
> 24.  _5 = j_10 < 0;
> 25.  _6 = (int) _5;
> 26.   assert (_6);
> 27.   return;
>
> Statement 19 in the program gets converted from  i_19 = p_16->b; in line 7 in 
> unoptimized code to i_12 = rt.b; in line 7 in optimized code which breaks the 
> dependency chain. We need to figure out the pass that does that and put some 
> handling code in there for the _dependent_ptr qualified pointers. Passing 
> simply -fipa-pure-const, -fguess-branch-probability or any other option alone 
> does not produce the optimized code that breaks the dependency. But applying 
> -O1, i.e., allowing all the optimizations does so. As passes are applied in a 
> certain order, we need to figure out upto what passes, the code remains same 
> and after what pass the dependency does not holds. So, we need to check the 
> translated code after every pass.
>

It's worth figuring out what passes are doing this - however the worry
I have is that every pass now needs to be handling this case with
respect to pointer attributes. Is there some place that you are
storing said information and what is the transitive nature of
assignments with these attributes ?

regards
Ramana


> Does this sounds like a workable plan for ? Let me know your thoughts. If 
> this sounds good then, we can do this for all the optimizations that may kill 
> the dependencies at somepoint.
>
> -Akshat


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Akshat Garg
On Tue, 2 Jul, 2019, 3:52 PM Ramana Radhakrishnan, <
ramana@googlemail.com> wrote:

> On Tue, Jul 2, 2019 at 1:29 AM Akshat Garg  wrote:
> >
> > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg  wrote:
> >>
> >> On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
> ramana@googlemail.com> wrote:
> >>>
> >> [CCing gcc mailing list]
> >>
> >> So, shall I start looking over the pointer optimizations only and see
> what information we may be needed on the same examples in the IR itself?
> >>
> >> - Akshat
> >
> > I have coded an example where equality comparison kills dependency from
> the document P0190R4 as shown below :
> >
> > 1. struct rcutest rt = {1, 2, 3};
> > 2. void thread0 ()
> > 3. {
> > 4. rt.a = -42;
> > 5. rt.b = -43;
> > 6. rt.c = -44;
> > 7. rcu_assign_pointer(gp, &rt);
> > 8. }
> > 9.
> > 10. void thread1 ()
> > 11. {
> > 12.int i = -1;
> > 13.int j = -1;
> > 14._Dependent_ptr struct rcutest *p;
> > 15.
> > 16.p = rcu_dereference(gp);
> > 17.j = p->a;
> > 18.   if (p == &rt)
> > 19.i = p->b;  /*Dependency breaking point*/
> > 20.   else if(p)
> > 21.   i = p->c;
> > 22.   assert(i<0);
> > 23.   assert(j<0);
> > 24. }
> > The gimple unoptimized code produced for lines 17-24 is shown below
> >
> > 1. if (p_16 == &rt)
> > 2. goto ; [INV]
> > 3.   else
> > 4.goto ; [INV]
> > 5.
> > 6.   :
> > 7.  i_19 = p_16->b;
> > 8.  goto ; [INV]
> > 9.
> > 10.   :
> > 11.  if (p_16 != 0B)
> > 12.goto ; [INV]
> > 13.  else
> > 14.goto ; [INV]
> > 15.
> > 16.   :
> > 17.  i_18 = p_16->c;
> > 18.
> > 19.   :
> > 20.  # i_7 = PHI 
> > 21.  _3 = i_7 < 0;
> > 22.  _4 = (int) _3;
> > 23.  assert (_4);
> > 24.  _5 = j_17 < 0;
> > 25.  _6 = (int) _5;
> > 26.  assert (_6);
> > 27.  return;
> >
> > The optimized code after -O1 is applied for the same lines is hown below
> :
> >
> > 1. if (_2 == &rt)
> > 2.goto ; [30.00%]
> > 3. else
> > 4.goto ; [70.00%]
> > 5.
> > 6.   [local count: 322122547]:
> > 7.   i_12 = rt.b;
> > 8.   goto ; [100.00%]
> > 9.
> > 10.   [local count: 751619277]:
> > 11.   if (_1 != 0)
> > 12.   goto ; [50.00%]
> > 13.   else
> > 14.goto ; [50.00%]
> > 15.
> > 16.   [local count: 375809638]:
> > 17.   i_11 = MEM[(dependent_ptr struct rcutest *)_2].c;
> > 18.
> > 19.[local count: 1073741824]:
> > 20.  # i_7 = PHI 
> > 21.   _3 = i_7 < 0;
> > 22.   _4 = (int) _3;
> > 23.   assert (_4);
> > 24.  _5 = j_10 < 0;
> > 25.  _6 = (int) _5;
> > 26.   assert (_6);
> > 27.   return;
> >
> > Statement 19 in the program gets converted from  i_19 = p_16->b; in line
> 7 in unoptimized code to i_12 = rt.b; in line 7 in optimized code which
> breaks the dependency chain. We need to figure out the pass that does that
> and put some handling code in there for the _dependent_ptr qualified
> pointers. Passing simply -fipa-pure-const, -fguess-branch-probability or
> any other option alone does not produce the optimized code that breaks the
> dependency. But applying -O1, i.e., allowing all the optimizations does so.
> As passes are applied in a certain order, we need to figure out upto what
> passes, the code remains same and after what pass the dependency does not
> holds. So, we need to check the translated code after every pass.
> >
>
> It's worth figuring out what passes are doing this - however the worry
> I have is that every pass now needs to be handling this case with
> respect to pointer attributes. Is there some place that you are
> storing said information and what is the transitive nature of
> assignments with these attributes ?
>
> regards
> Ramana
>
I don't understand what information to store, can you explain? I was
thinking of putting some code inside the pass, which breaks the dependency
chains, which will avoid this type of conversions when it finds out  that
the pointer is _Dependent_ptr qualified otherwise don't do anything. Can
you let me know what I am missing on?

In transitive nature, are you talking about the dependent pointer being
assigned to some non-dependent pointer and after further assigned to some
other pointer?

>
> > Does this sounds like a workable plan for ? Let me know your thoughts.
> If this sounds good then, we can do this for all the optimizations that may
> kill the dependencies at somepoint.
> >
> > -Akshat
>


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Ramana Radhakrishnan
>>
>> It's worth figuring out what passes are doing this - however the worry
>> I have is that every pass now needs to be handling this case with
>> respect to pointer attributes. Is there some place that you are
>> storing said information and what is the transitive nature of
>> assignments with these attributes ?
>>
>> regards
>> Ramana
>
> I don't understand what information to store, can you explain? I was thinking 
> of putting some code inside the pass, which breaks the dependency chains, 
> which will avoid this type of conversions when it finds out  that the pointer 
> is _Dependent_ptr qualified otherwise don't do anything. Can you let me know 
> what I am missing on?
>

That the pointer has an attribute that it's a dependent pointer . How
do you expect the later passes to detect it has a dependent_ptr
attribute attached to it ?

> In transitive nature, are you talking about the dependent pointer being 
> assigned to some non-dependent pointer and after further assigned to some 
> other pointer?


Yes. What's the expected behaviour as per the document ?

Ramana
>>
>>
>> > Does this sounds like a workable plan for ? Let me know your thoughts. If 
>> > this sounds good then, we can do this for all the optimizations that may 
>> > kill the dependencies at somepoint.
>> >
>> > -Akshat


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Paul E. McKenney
On Tue, Jul 02, 2019 at 12:01:00PM +0100, Ramana Radhakrishnan wrote:
> >>
> >> It's worth figuring out what passes are doing this - however the worry
> >> I have is that every pass now needs to be handling this case with
> >> respect to pointer attributes. Is there some place that you are
> >> storing said information and what is the transitive nature of
> >> assignments with these attributes ?
> >>
> >> regards
> >> Ramana
> >
> > I don't understand what information to store, can you explain? I was 
> > thinking of putting some code inside the pass, which breaks the dependency 
> > chains, which will avoid this type of conversions when it finds out  that 
> > the pointer is _Dependent_ptr qualified otherwise don't do anything. Can 
> > you let me know what I am missing on?
> >
> 
> That the pointer has an attribute that it's a dependent pointer . How
> do you expect the later passes to detect it has a dependent_ptr
> attribute attached to it ?
> 
> > In transitive nature, are you talking about the dependent pointer being 
> > assigned to some non-dependent pointer and after further assigned to some 
> > other pointer?
> 
> Yes. What's the expected behaviour as per the document ?

Once a user-created non-dependent pointer is assigned to, it is OK to
break the dependency.

Or am I missing the point here?

Thanx, Paul

> Ramana
> >>
> >>
> >> > Does this sounds like a workable plan for ? Let me know your thoughts. 
> >> > If this sounds good then, we can do this for all the optimizations that 
> >> > may kill the dependencies at somepoint.
> >> >
> >> > -Akshat
> 



Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Ramana Radhakrishnan
On Tue, Jul 2, 2019 at 1:38 PM Paul E. McKenney  wrote:

>
> Once a user-created non-dependent pointer is assigned to, it is OK to
> break the dependency.

Ok, that's good.
>
> Or am I missing the point here?

I was just trying to make sure we were on the same page. I wonder if
marking this volatile would be sufficient for prototyping. I suspect
we would need another flag somewhere which someone with gimple
knowledge might be able to help us with.

regards
Ramana

>
> Thanx, Paul
>
> > Ramana
> > >>
> > >>
> > >> > Does this sounds like a workable plan for ? Let me know your thoughts. 
> > >> > If this sounds good then, we can do this for all the optimizations 
> > >> > that may kill the dependencies at somepoint.
> > >> >
> > >> > -Akshat
> >
>


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Paul E. McKenney
On Tue, Jul 02, 2019 at 02:15:55PM +0100, Ramana Radhakrishnan wrote:
> On Tue, Jul 2, 2019 at 1:38 PM Paul E. McKenney  wrote:
> 
> >
> > Once a user-created non-dependent pointer is assigned to, it is OK to
> > break the dependency.
> 
> Ok, that's good.
> >
> > Or am I missing the point here?
> 
> I was just trying to make sure we were on the same page. I wonder if
> marking this volatile would be sufficient for prototyping. I suspect
> we would need another flag somewhere which someone with gimple
> knowledge might be able to help us with.

I expect that marking it as volatile would do the trick.  ;-)

Thanx, Paul

> regards
> Ramana
> 
> >
> > Thanx, Paul
> >
> > > Ramana
> > > >>
> > > >>
> > > >> > Does this sounds like a workable plan for ? Let me know your 
> > > >> > thoughts. If this sounds good then, we can do this for all the 
> > > >> > optimizations that may kill the dependencies at somepoint.
> > > >> >
> > > >> > -Akshat
> > >
> >
> 



Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Jason Merrill
On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney  wrote:
>
> On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote:
> > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg  wrote:
> >
> > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
> > > ramana@googlemail.com> wrote:
> > >
> > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg  wrote:
> > >> >
> > >> > As we have some working front-end code for _Dependent_ptr, What should
> > >> we do next? What I understand, we can start adding the library for
> > >> dependent_ptr and its functions for C corresponding to the ones we 
> > >> created
> > >> as C++ template library. Then, after that, we can move on to generating 
> > >> the
> > >> assembly code part.
> > >> >
> > >>
> > >>
> > >> I think the next step is figuring out how to model the Dependent
> > >> pointer information in the IR and figuring out what optimizations to
> > >> allow or not with that information. At this point , I suspect we need
> > >> a plan on record and have the conversation upstream on the lists.
> > >>
> > >> I think we need to put down a plan on record.
> > >>
> > >> Ramana
> > >
> > > [CCing gcc mailing list]
> > >
> > > So, shall I start looking over the pointer optimizations only and see what
> > > information we may be needed on the same examples in the IR itself?
> > >
> > > - Akshat
> > >
> > I have coded an example where equality comparison kills dependency from the
> > document P0190R4 as shown below :
> >
> > 1. struct rcutest rt = {1, 2, 3};
> > 2. void thread0 ()
> > 3. {
> > 4. rt.a = -42;
> > 5. rt.b = -43;
> > 6. rt.c = -44;
> > 7. rcu_assign_pointer(gp, &rt);
> > 8. }
> > 9.
> > 10. void thread1 ()
> > 11. {
> > 12.int i = -1;
> > 13.int j = -1;
> > 14._Dependent_ptr struct rcutest *p;
> > 15.
> > 16.p = rcu_dereference(gp);
> > 17.j = p->a;
> > 18.   if (p == &rt)
> > 19.i = p->b;  /*Dependency breaking point*/
> > 20.   else if(p)
> > 21.   i = p->c;
> > 22.   assert(i<0);
> > 23.   assert(j<0);
> > 24. }
> > The gimple unoptimized code produced for lines 17-24 is shown below
> >
> > 1. if (p_16 == &rt)
> > 2. goto ; [INV]
> > 3.   else
> > 4.goto ; [INV]
> > 5.
> > 6.   :
> > 7.  i_19 = p_16->b;
> > 8.  goto ; [INV]
> > 9.
> > 10.   :
> > 11.  if (p_16 != 0B)
> > 12.goto ; [INV]
> > 13.  else
> > 14.goto ; [INV]
> > 15.
> > 16.   :
> > 17.  i_18 = p_16->c;
> > 18.
> > 19.   :
> > 20.  # i_7 = PHI 
> > 21.  _3 = i_7 < 0;
> > 22.  _4 = (int) _3;
> > 23.  assert (_4);
> > 24.  _5 = j_17 < 0;
> > 25.  _6 = (int) _5;
> > 26.  assert (_6);
> > 27.  return;
> >
> > The optimized code after -O1 is applied for the same lines is hown below :
> >
> > 1. if (_2 == &rt)
> > 2.goto ; [30.00%]
> > 3. else
> > 4.goto ; [70.00%]
> > 5.
> > 6.   [local count: 322122547]:
> > 7.   i_12 = rt.b;
> > 8.   goto ; [100.00%]
> > 9.
> > 10.   [local count: 751619277]:
> > 11.   if (_1 != 0)
> > 12.   goto ; [50.00%]
> > 13.   else
> > 14.goto ; [50.00%]
> > 15.
> > 16.   [local count: 375809638]:
> > 17.   i_11 = MEM[(dependent_ptr struct rcutest *)_2].c;
> > 18.
> > 19.[local count: 1073741824]:
> > 20.  # i_7 = PHI 
> > 21.   _3 = i_7 < 0;
> > 22.   _4 = (int) _3;
> > 23.   assert (_4);
> > 24.  _5 = j_10 < 0;
> > 25.  _6 = (int) _5;
> > 26.   assert (_6);
> > 27.   return;
>
> Good show on tracing this through!
>
> > Statement 19 in the program gets converted from  i_19 = p_16->b; in line 7
> > in unoptimized code to i_12 = rt.b; in line 7 in optimized code which
> > breaks the dependency chain. We need to figure out the pass that does that
> > and put some handling code in there for the _dependent_ptr qualified
> > pointers. Passing simply -fipa-pure-const, -fguess-branch-probability or
> > any other option alone does not produce the optimized code that breaks the
> > dependency. But applying -O1, i.e., allowing all the optimizations does so.
> > As passes are applied in a certain order, we need to figure out up to what
> > passes, the code remains same and after what pass the dependency does not
> > holds. So, we need to check the translated code after every pass.
> >
> > Does this sounds like a workable plan for ? Let me know your thoughts. If
> > this sounds good then, we can do this for all the optimizations that may
> > kill the dependencies at somepoint.
>
> I don't know of a better plan.
>
> My usual question...  Is there some way to script the checking of the
> translated code at the end of each pass?

The usual way to check the output of an optimization pass is by
dumping the intermediate code at that point and matching the dump
against a regexp, as in the tree-ssa directories in the testsuite.
-fdump-tree-all will dump after all the gimple optimization passes,
and you can look through them until you find the breakage.

Jason


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Richard Biener
On July 2, 2019 5:36:08 PM GMT+02:00, Jason Merrill  wrote:
>On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney 
>wrote:
>>
>> On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote:
>> > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg 
>wrote:
>> >
>> > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
>> > > ramana@googlemail.com> wrote:
>> > >
>> > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg 
>wrote:
>> > >> >
>> > >> > As we have some working front-end code for _Dependent_ptr,
>What should
>> > >> we do next? What I understand, we can start adding the library
>for
>> > >> dependent_ptr and its functions for C corresponding to the ones
>we created
>> > >> as C++ template library. Then, after that, we can move on to
>generating the
>> > >> assembly code part.
>> > >> >
>> > >>
>> > >>
>> > >> I think the next step is figuring out how to model the Dependent
>> > >> pointer information in the IR and figuring out what
>optimizations to
>> > >> allow or not with that information. At this point , I suspect we
>need
>> > >> a plan on record and have the conversation upstream on the
>lists.
>> > >>
>> > >> I think we need to put down a plan on record.
>> > >>
>> > >> Ramana
>> > >
>> > > [CCing gcc mailing list]
>> > >
>> > > So, shall I start looking over the pointer optimizations only and
>see what
>> > > information we may be needed on the same examples in the IR
>itself?
>> > >
>> > > - Akshat
>> > >
>> > I have coded an example where equality comparison kills dependency
>from the
>> > document P0190R4 as shown below :
>> >
>> > 1. struct rcutest rt = {1, 2, 3};
>> > 2. void thread0 ()
>> > 3. {
>> > 4. rt.a = -42;
>> > 5. rt.b = -43;
>> > 6. rt.c = -44;
>> > 7. rcu_assign_pointer(gp, &rt);
>> > 8. }
>> > 9.
>> > 10. void thread1 ()
>> > 11. {
>> > 12.int i = -1;
>> > 13.int j = -1;
>> > 14._Dependent_ptr struct rcutest *p;
>> > 15.
>> > 16.p = rcu_dereference(gp);
>> > 17.j = p->a;
>> > 18.   if (p == &rt)
>> > 19.i = p->b;  /*Dependency breaking point*/
>> > 20.   else if(p)
>> > 21.   i = p->c;
>> > 22.   assert(i<0);
>> > 23.   assert(j<0);
>> > 24. }
>> > The gimple unoptimized code produced for lines 17-24 is shown below
>> >
>> > 1. if (p_16 == &rt)
>> > 2. goto ; [INV]
>> > 3.   else
>> > 4.goto ; [INV]
>> > 5.
>> > 6.   :
>> > 7.  i_19 = p_16->b;
>> > 8.  goto ; [INV]
>> > 9.
>> > 10.   :
>> > 11.  if (p_16 != 0B)
>> > 12.goto ; [INV]
>> > 13.  else
>> > 14.goto ; [INV]
>> > 15.
>> > 16.   :
>> > 17.  i_18 = p_16->c;
>> > 18.
>> > 19.   :
>> > 20.  # i_7 = PHI 
>> > 21.  _3 = i_7 < 0;
>> > 22.  _4 = (int) _3;
>> > 23.  assert (_4);
>> > 24.  _5 = j_17 < 0;
>> > 25.  _6 = (int) _5;
>> > 26.  assert (_6);
>> > 27.  return;
>> >
>> > The optimized code after -O1 is applied for the same lines is hown
>below :
>> >
>> > 1. if (_2 == &rt)
>> > 2.goto ; [30.00%]
>> > 3. else
>> > 4.goto ; [70.00%]
>> > 5.
>> > 6.   [local count: 322122547]:
>> > 7.   i_12 = rt.b;
>> > 8.   goto ; [100.00%]
>> > 9.
>> > 10.   [local count: 751619277]:
>> > 11.   if (_1 != 0)
>> > 12.   goto ; [50.00%]
>> > 13.   else
>> > 14.goto ; [50.00%]
>> > 15.
>> > 16.   [local count: 375809638]:
>> > 17.   i_11 = MEM[(dependent_ptr struct rcutest *)_2].c;
>> > 18.
>> > 19.[local count: 1073741824]:
>> > 20.  # i_7 = PHI 
>> > 21.   _3 = i_7 < 0;
>> > 22.   _4 = (int) _3;
>> > 23.   assert (_4);
>> > 24.  _5 = j_10 < 0;
>> > 25.  _6 = (int) _5;
>> > 26.   assert (_6);
>> > 27.   return;
>>
>> Good show on tracing this through!
>>
>> > Statement 19 in the program gets converted from  i_19 = p_16->b; in
>line 7
>> > in unoptimized code to i_12 = rt.b; in line 7 in optimized code
>which
>> > breaks the dependency chain. We need to figure out the pass that
>does that
>> > and put some handling code in there for the _dependent_ptr
>qualified
>> > pointers.

Wtf should this be for?  A type qualifier is certainly not going to work. 

Richard. 


 Passing simply -fipa-pure-const,
>-fguess-branch-probability or
>> > any other option alone does not produce the optimized code that
>breaks the
>> > dependency. But applying -O1, i.e., allowing all the optimizations
>does so.
>> > As passes are applied in a certain order, we need to figure out up
>to what
>> > passes, the code remains same and after what pass the dependency
>does not
>> > holds. So, we need to check the translated code after every pass.
>> >
>> > Does this sounds like a workable plan for ? Let me know your
>thoughts. If
>> > this sounds good then, we can do this for all the optimizations
>that may
>> > kill the dependencies at somepoint.
>>
>> I don't know of a better plan.
>>
>> My usual question...  Is there some way to script the checking of the
>> translated code at the end of each pass?
>
>The usual way to check the output of an optimization pass is by
>dumping the intermediate code at that point and matching the dump
>against a regexp, as in the tree-ssa directories in the testsuite.

Re: [EXT] Re: RFC on a new optimization

2019-07-02 Thread Gary Oblock
On 7/2/19 2:45 AM, Richard Biener wrote:
> External Email
>
> --
> On Mon, Jul 1, 2019 at 11:58 PM Gary Oblock  wrote:
>> I've been looking at trying to optimize the performance of code for
>> programs that use functions like qsort where a function is passed the
>> name of a function and some constant parameter(s).
>>
>> The function qsort itself is an excellent example of what I'm trying to show
>> what I want to do, except for being in a library, so please ignore
>> that while I proceed assuming that that qsort is not in a library.  In
>> qsort the user passes in a size of the array elements and comparison
>> function name in addition to the location of the array to be sorted. I
>> noticed that for a given call site that the first two are always the
>> same so why not create a specialized version of qsort that eliminates
>> them and internally uses a constant value for the size parameter and
>> does a direct call instead of an indirect call. The later lets the
>> comparison function code be inlined.
>>
>> This seems to me to be a very useful optimization where heavy use is
>> made of this programming idiom. I saw a 30%+ overall improvement when
>> I specialized a function like this by hand in an application.
>>
>> My question is does anything inside gcc do something similar? I don't
>> want to reinvent the wheel and I want to do something that plays
>> nicely with the rest of gcc so it makes it into real world. Note, I
>> should mention that I'm an experienced compiler developed and I'm
>> planning on adding this optimization unless it's obvious from the
>> ensuing discussion that either it's a bad idea or that it's a matter
>> of simply tweaking gcc a bit to get this optimization to occur.
> GCC performs intraprocedural constant propagation (IPA-CP) and
> this should catch your case already.  The IPA-CP function cloning
> might have too constrained limits (on code bloat) to apply on a
> specific testcase but all functionality for the qsort case should
> be available.
>
> Richard.
>
>> Thanks,
>>
>> Gary Oblock
Richard, I'm planning on using profile based heuristics
that are fairly conservative. However, I'll also also let the user
have access to a parameter to relax the heuristics to the degree
they desire if they want to do so.

Gary


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Akshat Garg
On Tue, Jul 2, 2019 at 8:40 PM Paul E. McKenney 
wrote:

> On Tue, Jul 02, 2019 at 02:15:55PM +0100, Ramana Radhakrishnan wrote:
> > On Tue, Jul 2, 2019 at 1:38 PM Paul E. McKenney 
> wrote:
> >
> > >
> > > Once a user-created non-dependent pointer is assigned to, it is OK to
> > > break the dependency.
> >
> > Ok, that's good.
> > >
> > > Or am I missing the point here?
> >
> > I was just trying to make sure we were on the same page. I wonder if
> > marking this volatile would be sufficient for prototyping. I suspect
> > we would need another flag somewhere which someone with gimple
> > knowledge might be able to help us with.
>
> I expect that marking it as volatile would do the trick.  ;-)
>
> Thanx, Paul
>
So, marking this pointer as volatile will not allow the compiler to
modify/optimize the statements, the pointer is appearing in. And we don't
need to push any other code inside any of the passes. Due to this, we want
to automatically say those dependent pointers are volatile and introduce a
new flag for this. Am I getting you guys correctly? Kindly, let me know?

Akshat

>
> > regards
> > Ramana
> >
> > >
> > > Thanx, Paul
> > >
> > > > Ramana
> > > > >>
> > > > >>
> > > > >> > Does this sounds like a workable plan for ? Let me know your
> thoughts. If this sounds good then, we can do this for all the
> optimizations that may kill the dependencies at somepoint.
> > > > >> >
> > > > >> > -Akshat
> > > >
> > >
> >
>
>


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-02 Thread Akshat Garg
On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill  wrote:

> On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney 
> wrote:
> >
> > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote:
> > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg  wrote:
> > >
> > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
> > > > ramana@googlemail.com> wrote:
> > > >
> > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg 
> wrote:
> > > >> >
> > > >> > As we have some working front-end code for _Dependent_ptr, What
> should
> > > >> we do next? What I understand, we can start adding the library for
> > > >> dependent_ptr and its functions for C corresponding to the ones we
> created
> > > >> as C++ template library. Then, after that, we can move on to
> generating the
> > > >> assembly code part.
> > > >> >
> > > >>
> > > >>
> > > >> I think the next step is figuring out how to model the Dependent
> > > >> pointer information in the IR and figuring out what optimizations to
> > > >> allow or not with that information. At this point , I suspect we
> need
> > > >> a plan on record and have the conversation upstream on the lists.
> > > >>
> > > >> I think we need to put down a plan on record.
> > > >>
> > > >> Ramana
> > > >
> > > > [CCing gcc mailing list]
> > > >
> > > > So, shall I start looking over the pointer optimizations only and
> see what
> > > > information we may be needed on the same examples in the IR itself?
> > > >
> > > > - Akshat
> > > >
> > > I have coded an example where equality comparison kills dependency
> from the
> > > document P0190R4 as shown below :
> > >
> > > 1. struct rcutest rt = {1, 2, 3};
> > > 2. void thread0 ()
> > > 3. {
> > > 4. rt.a = -42;
> > > 5. rt.b = -43;
> > > 6. rt.c = -44;
> > > 7. rcu_assign_pointer(gp, &rt);
> > > 8. }
> > > 9.
> > > 10. void thread1 ()
> > > 11. {
> > > 12.int i = -1;
> > > 13.int j = -1;
> > > 14._Dependent_ptr struct rcutest *p;
> > > 15.
> > > 16.p = rcu_dereference(gp);
> > > 17.j = p->a;
> > > 18.   if (p == &rt)
> > > 19.i = p->b;  /*Dependency breaking point*/
> > > 20.   else if(p)
> > > 21.   i = p->c;
> > > 22.   assert(i<0);
> > > 23.   assert(j<0);
> > > 24. }
> > > The gimple unoptimized code produced for lines 17-24 is shown below
> > >
> > > 1. if (p_16 == &rt)
> > > 2. goto ; [INV]
> > > 3.   else
> > > 4.goto ; [INV]
> > > 5.
> > > 6.   :
> > > 7.  i_19 = p_16->b;
> > > 8.  goto ; [INV]
> > > 9.
> > > 10.   :
> > > 11.  if (p_16 != 0B)
> > > 12.goto ; [INV]
> > > 13.  else
> > > 14.goto ; [INV]
> > > 15.
> > > 16.   :
> > > 17.  i_18 = p_16->c;
> > > 18.
> > > 19.   :
> > > 20.  # i_7 = PHI 
> > > 21.  _3 = i_7 < 0;
> > > 22.  _4 = (int) _3;
> > > 23.  assert (_4);
> > > 24.  _5 = j_17 < 0;
> > > 25.  _6 = (int) _5;
> > > 26.  assert (_6);
> > > 27.  return;
> > >
> > > The optimized code after -O1 is applied for the same lines is hown
> below :
> > >
> > > 1. if (_2 == &rt)
> > > 2.goto ; [30.00%]
> > > 3. else
> > > 4.goto ; [70.00%]
> > > 5.
> > > 6.   [local count: 322122547]:
> > > 7.   i_12 = rt.b;
> > > 8.   goto ; [100.00%]
> > > 9.
> > > 10.   [local count: 751619277]:
> > > 11.   if (_1 != 0)
> > > 12.   goto ; [50.00%]
> > > 13.   else
> > > 14.goto ; [50.00%]
> > > 15.
> > > 16.   [local count: 375809638]:
> > > 17.   i_11 = MEM[(dependent_ptr struct rcutest *)_2].c;
> > > 18.
> > > 19.[local count: 1073741824]:
> > > 20.  # i_7 = PHI 
> > > 21.   _3 = i_7 < 0;
> > > 22.   _4 = (int) _3;
> > > 23.   assert (_4);
> > > 24.  _5 = j_10 < 0;
> > > 25.  _6 = (int) _5;
> > > 26.   assert (_6);
> > > 27.   return;
> >
> > Good show on tracing this through!
> >
> > > Statement 19 in the program gets converted from  i_19 = p_16->b; in
> line 7
> > > in unoptimized code to i_12 = rt.b; in line 7 in optimized code which
> > > breaks the dependency chain. We need to figure out the pass that does
> that
> > > and put some handling code in there for the _dependent_ptr qualified
> > > pointers. Passing simply -fipa-pure-const, -fguess-branch-probability
> or
> > > any other option alone does not produce the optimized code that breaks
> the
> > > dependency. But applying -O1, i.e., allowing all the optimizations
> does so.
> > > As passes are applied in a certain order, we need to figure out up to
> what
> > > passes, the code remains same and after what pass the dependency does
> not
> > > holds. So, we need to check the translated code after every pass.
> > >
> > > Does this sounds like a workable plan for ? Let me know your thoughts.
> If
> > > this sounds good then, we can do this for all the optimizations that
> may
> > > kill the dependencies at somepoint.
> >
> > I don't know of a better plan.
> >
> > My usual question...  Is there some way to script the checking of the
> > translated code at the end of each pass?
>
> The usual way to check the output of an optimization pass is by
> dumping the intermediate code at that point and matching the dump
> agains

Re: [PATCH] Add .gnu.lto_.lto section.

2019-07-02 Thread Jeff Law
On 7/1/19 4:59 AM, Martin Liška wrote:
> Hi.
> 
> Ok, so there's a version with added ChangeLog that survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> 
> 0001-Add-.gnu.lto_.lto-section.patch
> 
> From e6745583dc4b7f5543878c0a25498e818531f73e Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Fri, 21 Jun 2019 12:14:04 +0200
> Subject: [PATCH 1/2] Add .gnu.lto_.lto section.
> 
> gcc/ChangeLog:
> 
> 2019-07-01  Martin Liska  
> 
>   * lto-section-in.c (lto_get_section_data): Add "lto" section.
>   * lto-section-out.c (lto_destroy_simple_output_block): Never
>   compress LTO_section_lto section.
>   * lto-streamer-out.c (produce_asm): Do not set major_version
>   and minor_version.
>   (lto_output_toplevel_asms): Likewise.
>   (produce_lto_section): New function.
>   (lto_output): Call produce_lto_section.
>   (lto_write_mode_table): Do not set major_version and
>   minor_version.
>   (produce_asm_for_decls): Likewise.
>   * lto-streamer.h (enum lto_section_type): Add LTO_section_lto
>   type.
>   (struct lto_header): Remove.
>   (struct lto_section): New struct.
>   (struct lto_simple_header): Do not inherit from lto_header.
>   (struct lto_file_decl_data): Add lto_section_header field.
OK
jeff


Re: [PATCH 2/2] Add zstd support for LTO bytecode compression.

2019-07-02 Thread Jeff Law
On 7/1/19 5:00 AM, Martin Liška wrote:
> Hi.
> 
> This is updated version of the zstd patch that should handle all what Joseph
> pointed out.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> 
> 0002-Add-zstd-support-for-LTO-bytecode-compression.patch
> 
> From 5d2006c9c4d481f4083d5a591327ee64847b0bf7 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Thu, 20 Jun 2019 10:08:17 +0200
> Subject: [PATCH 2/2] Add zstd support for LTO bytecode compression.
> 
> gcc/ChangeLog:
> 
> 2019-07-01  Martin Liska  
> 
>   * Makefile.in: Define ZSTD_LIB.
>   * common.opt: Adjust compression level
>   to support also zstd levels.
>   * config.in: Regenerate.
>   * configure: Likewise.
>   * configure.ac: Add --with-zstd and --with-zstd-include options
>   and detect ZSTD.
>   * doc/install.texi: Mention zstd dependency.
>   * gcc.c: Print supported LTO compression algorithms.
>   * lto-compress.c (lto_normalized_zstd_level): Likewise.
>   (lto_compression_zstd): Likewise.
>   (lto_uncompression_zstd): Likewise.
>   (lto_end_compression): Dispatch in between zlib and zstd.
>   (lto_compression_zlib): Mark with ATTRIBUTE_UNUSED.
>   (lto_uncompression_zlib): Make it static.
>   * lto-compress.h (lto_end_uncompression): Fix GNU coding style.
>   * lto-section-in.c (lto_get_section_data): Pass info
>   about used compression.
>   * lto-streamer-out.c: By default use zstd when possible.
>   * timevar.def (TV_IPA_LTO_DECOMPRESS): Rename to decompression
>   (TV_IPA_LTO_COMPRESS): Likewise for compression.
OK
jeff


Profile Graph for GIMPLE optimizations in expand_all_functions and other passes

2019-07-02 Thread nick
Giuliano,

I asked for some documentation off you related to the RTL passes. Not sure
if you are just hitting bottlenecks in all_rtl_passes or ipa_passes functions
but it seems that the SSA trees and cfgloop.c and cfgloop.h files optimization
passes would still be a issue. Particularly after the final GIMPLE and IPA
passes it would be great to multi-thread and be able to walk the dominator
trees multi-threaded.

Not sure if you've looked as what seem to be issues here and was wondering
if these are happening in your profiling still. GENERIC to GIMPLE may also
be a issue in gimipfly but less than those.

Again I understand if's out of scope but it would be great if you have a current
profile graph that I can see. It would give me an idea of where to start working
outside of the core GIMPLE optimizations passes your working on. 


Huge thanks and again good luck,

Nick


P.S. Don't worry if you don't it would just be nice to have and rewriting 
multi-threaded
memory allocation is not easy and even more so with the shared state between 
compiler
passes.