Re: [EXT] Re: Can LTO minor version be updated in backward compatible way ?

2019-07-22 Thread Richard Biener
On Fri, Jul 19, 2019 at 10:30 AM Florian Weimer  wrote:
>
> * Romain Geissler:
>
> > That may fly in the open source world, however I expect some vendors
> > shipping proprietary code might be fine with assembly/LTO
> > representation of their product, but not source.
>
> They can't ship LTO today anyway due to the format incompatibility, so
> that's not really an argument against source-based LTO.

Source-based LTO doesn't really work unless you can re-synthesize
source from the IL.  At least I don't see how you can do whole-program
analysis on source and then cut it into appropriate pieces, duplicating
some things and some not to make up for the parallel final compile step.

Richard.

> Thanks,
> Florian


Re: [GSoC'19] Parallelize GCC with Threads -- Second Evaluation Status

2019-07-22 Thread Richard Biener
On Sun, 21 Jul 2019, Giuliano Belinassi wrote:

> Hi all,
> 
> Here is my second evaluation report, together with a simple program that
> I was able to compile with my parallel version of GCC. Keep in mind that
> I still have lots of concurrent issues inside the compiler and therefore
> my branch will fail to compile pretty much anything else.
> 
> To reproduce my current branch, use the following steps:
> 
> 1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel
> 
> 2-) Edit gcc/graphunit.c's variable `num_threads` to 1.
> 
> 3-) Compile with --disable-bootstrap --enable-languages=c
> 
> 4-) make
> 
> 5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance.
> 
> 6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc"
> 
> 7-) compile the program using -O2
> 
> I a attaching my report in markdown format, which you can convert to pdf
> using `pandoc` if you find it difficult to read in the current format.
> 
> I am also open to suggestions. Please do not hesitate to comment :)

Thanks for the report and it's great that you are making progress!

I suggest you add a --param (edit params.def) so one can choose
num_threads on the command-line instead of needing to recompile GCC.
Just keep the default "safe" so that GCC build itself will still work.

For most of the allocators I think that in the end we want to
keep most of them global but have either per-thread freelists
or a freelist implementation that can work (allocate and free)
without locking, employing some RCU scheme.  Not introducing
per-thread state is probably leaner on the implementation.
It would of course mean taking a lock when the freelist needs to
be re-filled from the main pool but that's hopefully not common.
I don't know a RCU allocator freelist implementation to copy/learn
from, but experimenting with such before going the per thread freelist
might be interesting.  Maybe not all allocators need to be treated
equal either.

Your memory-block issue is likely that you added

{
  if (!instance)
instance = XNEW (memory_block_pool);

but as misleading as it is, XNEW doesn't invoke C++ new but
just malloc so the allocated structure isn't initialized
since it's constructor isn't invoked.  Just use

instance = new memory_block_pool;

with that I get helgrind to run (without complaining!) on your
testcase.  I also get to compile gimple-match.c with two threads
for more than one minute before crashing on some EVRP global
state (somehow I knew the passes global state would be quite a
distraction...).

I hope the project will be motivation to cleanup the way we
handle pass-specific global state.

Thanks again,
Richard.


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-22 Thread Richard Biener
On Mon, Jul 22, 2019 at 2:27 AM Akshat Garg  wrote:

> Hi all,
> Consider part of an example(figure 20) from doc P0190R4(
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf)
> shown below:
>
> 1.  void thread1 (void)
> 2.  {
> 3.int * volatile p;
> 4.p = rcu_dereference(gip);
> 5.if (p)
> 6.assert(*(p+p[0]) == 42);
> 7.  }
> The .gimple code produced is :
>
> 1.  thread1 ()
> 2.  {
> 3.   atomic int * D.1992;
> 4.int * volatile p;
> 5.  {
> 6.atomic int * * __atomic_load_ptr;
> 7.   atomic int * __atomic_load_tmp;
> 8.try
> 9. {
> 10.__atomic_load_ptr = &gip;
> 11._1 = __atomic_load_8 (__atomic_load_ptr, 1);
> 12._2 = (atomic int *) _1;
> 13.__atomic_load_tmp = _2;
> 14.D.1992 = __atomic_load_tmp;
> 15. }
> 16.finally
> 17.  {
> 18.__atomic_load_tmp = {CLOBBER};
> 19.  }
> 20.  }
> 21.   p = D.1992;
> 22.   p.2_3 = p;
> 23.  if (p.2_3 != 0B) goto ; else goto ;
> 24.  :
> 25.   p.3_4 = p;
> 26.  p.4_5 = p;
> 27.   _6 = *p.4_5;
> 28.  _7 = (long unsigned int) _6;
> 29.  _8 = _7 * 4;
> 30.  _9 = p.3_4 + _8;
> 31.  _10 = *_9;
> 32.  _11 = _10 == 42;
> 33.  _12 = (int) _11;
> 34.  assert (_12);
> 35.  :
> 36. }
>
> assert at line 34 in .gimple code still breaks the dependency given by the
> user. I believe, there should be some ssa defined variable of p or p itself
> in assert. This is happening when I am considering pointer as volatile
> qualified. If I consider it as _Dependent_ptr qualified then it surely
> produces the broken dependency code. Let me know, if I wrong somewhere.
>
>
p appears as memory here which we load its value to p.3_4 which we then
offset by _8 and load from the
computed address into _10 which then appears in the assert condition.  I
think that's as good as it can
get ...

Richard.


> -Akshat
>
>
>
>
> On Wed, Jul 17, 2019 at 4:23 PM Akshat Garg  wrote:
>
>> On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill  wrote:
>>
>>> On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney 
>>> wrote:
>>> >
>>> > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote:
>>> > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg 
>>> wrote:
>>> > >
>>> > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
>>> > > > ramana@googlemail.com> wrote:
>>> > > >
>>> > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg 
>>> wrote:
>>> > > >> >
>>> > > >> > As we have some working front-end code for _Dependent_ptr, What
>>> should
>>> > > >> we do next? What I understand, we can start adding the library for
>>> > > >> dependent_ptr and its functions for C corresponding to the ones
>>> we created
>>> > > >> as C++ template library. Then, after that, we can move on to
>>> generating the
>>> > > >> assembly code part.
>>> > > >> >
>>> > > >>
>>> > > >>
>>> > > >> I think the next step is figuring out how to model the Dependent
>>> > > >> pointer information in the IR and figuring out what optimizations
>>> to
>>> > > >> allow or not with that information. At this point , I suspect we
>>> need
>>> > > >> a plan on record and have the conversation upstream on the lists.
>>> > > >>
>>> > > >> I think we need to put down a plan on record.
>>> > > >>
>>> > > >> Ramana
>>> > > >
>>> > > > [CCing gcc mailing list]
>>> > > >
>>> > > > So, shall I start looking over the pointer optimizations only and
>>> see what
>>> > > > information we may be needed on the same examples in the IR itself?
>>> > > >
>>> > > > - Akshat
>>> > > >
>>> > > I have coded an example where equality comparison kills dependency
>>> from the
>>> > > document P0190R4 as shown below :
>>> > >
>>> > > 1. struct rcutest rt = {1, 2, 3};
>>> > > 2. void thread0 ()
>>> > > 3. {
>>> > > 4. rt.a = -42;
>>> > > 5. rt.b = -43;
>>> > > 6. rt.c = -44;
>>> > > 7. rcu_assign_pointer(gp, &rt);
>>> > > 8. }
>>> > > 9.
>>> > > 10. void thread1 ()
>>> > > 11. {
>>> > > 12.int i = -1;
>>> > > 13.int j = -1;
>>> > > 14._Dependent_ptr struct rcutest *p;
>>> > > 15.
>>> > > 16.p = rcu_dereference(gp);
>>> > > 17.j = p->a;
>>> > > 18.   if (p == &rt)
>>> > > 19.i = p->b;  /*Dependency breaking point*/
>>> > > 20.   else if(p)
>>> > > 21.   i = p->c;
>>> > > 22.   assert(i<0);
>>> > > 23.   assert(j<0);
>>> > > 24. }
>>> > > The gimple unoptimized code produced for lines 17-24 is shown below
>>> > >
>>> > > 1. if (p_16 == &rt)
>>> > > 2. goto ; [INV]
>>> > > 3.   else
>>> > > 4.goto ; [INV]
>>> > > 5.
>>> > > 6.   :
>>> > > 7.  i_19 = p_16->b;
>>> > > 8.  goto ; [INV]
>>> > > 9.
>>> > > 10.   :
>>> > > 11.  if (p_16 != 0B)
>>> > > 12.goto ; [INV]
>>> > > 13.  else
>>> > > 14.goto ; [INV]
>>> > > 15.
>>> > > 16.   :
>>> > > 17.  i_18 = p_16->c;
>>> > > 18.
>>> > > 19.   :
>>> > > 20.  # i_7 = PHI 
>>> > > 21.  _3 = i_7 < 0;
>>> > > 22.  _4 = (int) _3;
>>> > > 23.  assert (_4);
>>> > > 24.  _5 = j_17 < 0;
>>> > > 25.  _6 = (int) _5;
>>> > > 26.  assert (_6);
>>> > > 27.  return;
>>> > >
>>> > > The optimize

Re: Doubts regarding the _Dependent_ptr keyword

2019-07-22 Thread Akshat Garg
On Mon, Jul 22, 2019 at 2:11 PM Richard Biener 
wrote:

> On Mon, Jul 22, 2019 at 2:27 AM Akshat Garg  wrote:
>
>> Hi all,
>> Consider part of an example(figure 20) from doc P0190R4(
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf)
>> shown below:
>>
>> 1.  void thread1 (void)
>> 2.  {
>> 3.int * volatile p;
>> 4.p = rcu_dereference(gip);
>> 5.if (p)
>> 6.assert(*(p+p[0]) == 42);
>> 7.  }
>> The .gimple code produced is :
>>
>> 1.  thread1 ()
>> 2.  {
>> 3.   atomic int * D.1992;
>> 4.int * volatile p;
>> 5.  {
>> 6.atomic int * * __atomic_load_ptr;
>> 7.   atomic int * __atomic_load_tmp;
>> 8.try
>> 9. {
>> 10.__atomic_load_ptr = &gip;
>> 11._1 = __atomic_load_8 (__atomic_load_ptr, 1);
>> 12._2 = (atomic int *) _1;
>> 13.__atomic_load_tmp = _2;
>> 14.D.1992 = __atomic_load_tmp;
>> 15. }
>> 16.finally
>> 17.  {
>> 18.__atomic_load_tmp = {CLOBBER};
>> 19.  }
>> 20.  }
>> 21.   p = D.1992;
>> 22.   p.2_3 = p;
>> 23.  if (p.2_3 != 0B) goto ; else goto ;
>> 24.  :
>> 25.   p.3_4 = p;
>> 26.  p.4_5 = p;
>> 27.   _6 = *p.4_5;
>> 28.  _7 = (long unsigned int) _6;
>> 29.  _8 = _7 * 4;
>> 30.  _9 = p.3_4 + _8;
>> 31.  _10 = *_9;
>> 32.  _11 = _10 == 42;
>> 33.  _12 = (int) _11;
>> 34.  assert (_12);
>> 35.  :
>> 36. }
>>
>> assert at line 34 in .gimple code still breaks the dependency given by
>> the user. I believe, there should be some ssa defined variable of p or p
>> itself in assert. This is happening when I am considering pointer as
>> volatile qualified. If I consider it as _Dependent_ptr qualified then it
>> surely produces the broken dependency code. Let me know, if I wrong
>> somewhere.
>>
>>
> p appears as memory here which we load its value to p.3_4 which we then
> offset by _8 and load from the
> computed address into _10 which then appears in the assert condition.  I
> think that's as good as it can
> get ...
>
> Richard.
>

Thank you for your reply. For, the same example above, consider this (
https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L402)
instruction at rtl level changed form this (
https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L231)
during the cse pass. The variable p.2_3 gets replaced by a temporary _1 but
_1 is not any dependent pointer where, p.2_3 was. Is this also not breaking
any dependencies?

-Akshat
>>
>>
>>
>>
>> On Wed, Jul 17, 2019 at 4:23 PM Akshat Garg  wrote:
>>
>>> On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill  wrote:
>>>
 On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney 
 wrote:
 >
 > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote:
 > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg 
 wrote:
 > >
 > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
 > > > ramana@googlemail.com> wrote:
 > > >
 > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg 
 wrote:
 > > >> >
 > > >> > As we have some working front-end code for _Dependent_ptr,
 What should
 > > >> we do next? What I understand, we can start adding the library
 for
 > > >> dependent_ptr and its functions for C corresponding to the ones
 we created
 > > >> as C++ template library. Then, after that, we can move on to
 generating the
 > > >> assembly code part.
 > > >> >
 > > >>
 > > >>
 > > >> I think the next step is figuring out how to model the Dependent
 > > >> pointer information in the IR and figuring out what
 optimizations to
 > > >> allow or not with that information. At this point , I suspect we
 need
 > > >> a plan on record and have the conversation upstream on the lists.
 > > >>
 > > >> I think we need to put down a plan on record.
 > > >>
 > > >> Ramana
 > > >
 > > > [CCing gcc mailing list]
 > > >
 > > > So, shall I start looking over the pointer optimizations only and
 see what
 > > > information we may be needed on the same examples in the IR
 itself?
 > > >
 > > > - Akshat
 > > >
 > > I have coded an example where equality comparison kills dependency
 from the
 > > document P0190R4 as shown below :
 > >
 > > 1. struct rcutest rt = {1, 2, 3};
 > > 2. void thread0 ()
 > > 3. {
 > > 4. rt.a = -42;
 > > 5. rt.b = -43;
 > > 6. rt.c = -44;
 > > 7. rcu_assign_pointer(gp, &rt);
 > > 8. }
 > > 9.
 > > 10. void thread1 ()
 > > 11. {
 > > 12.int i = -1;
 > > 13.int j = -1;
 > > 14._Dependent_ptr struct rcutest *p;
 > > 15.
 > > 16.p = rcu_dereference(gp);
 > > 17.j = p->a;
 > > 18.   if (p == &rt)
 > > 19.i = p->b;  /*Dependency breaking point*/
 > > 20.   else if(p)
 > > 21.   i = p->c;
 > > 22.   assert(i<0);
 > > 23.   assert(j<0);
 > > 24. }
 > > The gimple unoptimized code produced for lin

Re: [GSoC'19] Parallelize GCC with Threads -- Second Evaluation Status

2019-07-22 Thread Richard Biener
On Mon, 22 Jul 2019, Richard Biener wrote:

> On Sun, 21 Jul 2019, Giuliano Belinassi wrote:
> 
> > Hi all,
> > 
> > Here is my second evaluation report, together with a simple program that
> > I was able to compile with my parallel version of GCC. Keep in mind that
> > I still have lots of concurrent issues inside the compiler and therefore
> > my branch will fail to compile pretty much anything else.
> > 
> > To reproduce my current branch, use the following steps:
> > 
> > 1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel
> > 
> > 2-) Edit gcc/graphunit.c's variable `num_threads` to 1.
> > 
> > 3-) Compile with --disable-bootstrap --enable-languages=c
> > 
> > 4-) make
> > 
> > 5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance.
> > 
> > 6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc"
> > 
> > 7-) compile the program using -O2
> > 
> > I a attaching my report in markdown format, which you can convert to pdf
> > using `pandoc` if you find it difficult to read in the current format.
> > 
> > I am also open to suggestions. Please do not hesitate to comment :)
> 
> Thanks for the report and it's great that you are making progress!
> 
> I suggest you add a --param (edit params.def) so one can choose
> num_threads on the command-line instead of needing to recompile GCC.
> Just keep the default "safe" so that GCC build itself will still work.
> 
> For most of the allocators I think that in the end we want to
> keep most of them global but have either per-thread freelists
> or a freelist implementation that can work (allocate and free)
> without locking, employing some RCU scheme.  Not introducing
> per-thread state is probably leaner on the implementation.
> It would of course mean taking a lock when the freelist needs to
> be re-filled from the main pool but that's hopefully not common.
> I don't know a RCU allocator freelist implementation to copy/learn
> from, but experimenting with such before going the per thread freelist
> might be interesting.  Maybe not all allocators need to be treated
> equal either.
> 
> Your memory-block issue is likely that you added
> 
> {
>   if (!instance)
> instance = XNEW (memory_block_pool);
> 
> but as misleading as it is, XNEW doesn't invoke C++ new but
> just malloc so the allocated structure isn't initialized
> since it's constructor isn't invoked.  Just use
> 
> instance = new memory_block_pool;
> 
> with that I get helgrind to run (without complaining!) on your
> testcase.  I also get to compile gimple-match.c with two threads
> for more than one minute before crashing on some EVRP global
> state (somehow I knew the passes global state would be quite a
> distraction...).
> 
> I hope the project will be motivation to cleanup the way we
> handle pass-specific global state.

Btw, to get to "working" state quicker you might consider
concentrating on a pass subset for which you can conveniently
restrict optimization to just -Og, effectively parallelizing
pass_all_optimizations_g only, you then probably hit more
issues in infrastructure which is more interesting for the
project (we know there's a lot of pass-specific global state...).
Of course the time spent in pass_all_optimizations_g is minimal...

I then hit tree-ssa-live.c:usedvars quickly (slap __thread on it)
and after that the EVRP issue via the sprintf_length pass.

Richard.



Re: Doubts regarding the _Dependent_ptr keyword

2019-07-22 Thread Richard Biener
On Mon, Jul 22, 2019 at 10:54 AM Akshat Garg  wrote:

> On Mon, Jul 22, 2019 at 2:11 PM Richard Biener 
> wrote:
>
>> On Mon, Jul 22, 2019 at 2:27 AM Akshat Garg  wrote:
>>
>>> Hi all,
>>> Consider part of an example(figure 20) from doc P0190R4(
>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf)
>>> shown below:
>>>
>>> 1.  void thread1 (void)
>>> 2.  {
>>> 3.int * volatile p;
>>> 4.p = rcu_dereference(gip);
>>> 5.if (p)
>>> 6.assert(*(p+p[0]) == 42);
>>> 7.  }
>>> The .gimple code produced is :
>>>
>>> 1.  thread1 ()
>>> 2.  {
>>> 3.   atomic int * D.1992;
>>> 4.int * volatile p;
>>> 5.  {
>>> 6.atomic int * * __atomic_load_ptr;
>>> 7.   atomic int * __atomic_load_tmp;
>>> 8.try
>>> 9. {
>>> 10.__atomic_load_ptr = &gip;
>>> 11._1 = __atomic_load_8 (__atomic_load_ptr, 1);
>>> 12._2 = (atomic int *) _1;
>>> 13.__atomic_load_tmp = _2;
>>> 14.D.1992 = __atomic_load_tmp;
>>> 15. }
>>> 16.finally
>>> 17.  {
>>> 18.__atomic_load_tmp = {CLOBBER};
>>> 19.  }
>>> 20.  }
>>> 21.   p = D.1992;
>>> 22.   p.2_3 = p;
>>> 23.  if (p.2_3 != 0B) goto ; else goto ;
>>> 24.  :
>>> 25.   p.3_4 = p;
>>> 26.  p.4_5 = p;
>>> 27.   _6 = *p.4_5;
>>> 28.  _7 = (long unsigned int) _6;
>>> 29.  _8 = _7 * 4;
>>> 30.  _9 = p.3_4 + _8;
>>> 31.  _10 = *_9;
>>> 32.  _11 = _10 == 42;
>>> 33.  _12 = (int) _11;
>>> 34.  assert (_12);
>>> 35.  :
>>> 36. }
>>>
>>> assert at line 34 in .gimple code still breaks the dependency given by
>>> the user. I believe, there should be some ssa defined variable of p or p
>>> itself in assert. This is happening when I am considering pointer as
>>> volatile qualified. If I consider it as _Dependent_ptr qualified then it
>>> surely produces the broken dependency code. Let me know, if I wrong
>>> somewhere.
>>>
>>>
>> p appears as memory here which we load its value to p.3_4 which we then
>> offset by _8 and load from the
>> computed address into _10 which then appears in the assert condition.  I
>> think that's as good as it can
>> get ...
>>
>> Richard.
>>
>
> Thank you for your reply. For, the same example above, consider this (
> https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L402)
> instruction at rtl level changed form this (
> https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L231)
> during the cse pass. The variable p.2_3 gets replaced by a temporary _1 but
> _1 is not any dependent pointer where, p.2_3 was. Is this also not breaking
> any dependencies
>

I'm not sure.  In general CSE can break dependences.  If the dependent
pointer chain needs to conver multiple levels of
indirections from the original atomic operation you need to make sure to
not expose atomics as CSEable.  Thus on
RTL have them all UNSPECs.

Richard.


> -Akshat
>>>
>>>
>>>
>>>
>>> On Wed, Jul 17, 2019 at 4:23 PM Akshat Garg  wrote:
>>>
 On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill  wrote:

> On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney 
> wrote:
> >
> > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote:
> > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg 
> wrote:
> > >
> > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
> > > > ramana@googlemail.com> wrote:
> > > >
> > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg 
> wrote:
> > > >> >
> > > >> > As we have some working front-end code for _Dependent_ptr,
> What should
> > > >> we do next? What I understand, we can start adding the library
> for
> > > >> dependent_ptr and its functions for C corresponding to the ones
> we created
> > > >> as C++ template library. Then, after that, we can move on to
> generating the
> > > >> assembly code part.
> > > >> >
> > > >>
> > > >>
> > > >> I think the next step is figuring out how to model the Dependent
> > > >> pointer information in the IR and figuring out what
> optimizations to
> > > >> allow or not with that information. At this point , I suspect
> we need
> > > >> a plan on record and have the conversation upstream on the
> lists.
> > > >>
> > > >> I think we need to put down a plan on record.
> > > >>
> > > >> Ramana
> > > >
> > > > [CCing gcc mailing list]
> > > >
> > > > So, shall I start looking over the pointer optimizations only
> and see what
> > > > information we may be needed on the same examples in the IR
> itself?
> > > >
> > > > - Akshat
> > > >
> > > I have coded an example where equality comparison kills dependency
> from the
> > > document P0190R4 as shown below :
> > >
> > > 1. struct rcutest rt = {1, 2, 3};
> > > 2. void thread0 ()
> > > 3. {
> > > 4. rt.a = -42;
> > > 5. rt.b = -43;
> > > 6. rt.c = -44;
> > > 7. rcu_assign_pointer(gp, &rt);
> > > 8. }
> > > 9.
> > >

gcc/config/arch/arch.opt: Option mask gen problem

2019-07-22 Thread Maxim Blinov
Hi all,

Is it possible, in the arch.opt file, to have GCC generate a bitmask
relative to a user-defined variable without an associated name? To
illustrate my problem, consider the following option file snippet:

...
Variable
HOST_WIDE_INT riscv_bitmanip_flags = 0
...
mbmi-zbb
Target Mask(BITMANIP_ZBB) Var(riscv_bitmanip_flags)
Support the base subset of the Bitmanip extension.
...

This generates the following lines in the build/gcc/options.h (marker
added by me for clarity):

...
#define OPTION_MASK_BITMANIP_ZBB (HOST_WIDE_INT_1U << 0) // 
#define OPTION_MASK_BITMANIP_ZBC (HOST_WIDE_INT_1U << 1)
#define OPTION_MASK_BITMANIP_ZBE (HOST_WIDE_INT_1U << 2)
#define OPTION_MASK_BITMANIP_ZBF (HOST_WIDE_INT_1U << 3)
#define OPTION_MASK_BITMANIP_ZBM (HOST_WIDE_INT_1U << 4)
#define OPTION_MASK_BITMANIP_ZBP (HOST_WIDE_INT_1U << 5)
#define OPTION_MASK_BITMANIP_ZBR (HOST_WIDE_INT_1U << 6)
#define OPTION_MASK_BITMANIP_ZBS (HOST_WIDE_INT_1U << 7)
#define OPTION_MASK_BITMANIP_ZBT (HOST_WIDE_INT_1U << 8)
#define MASK_DIV (1U << 0)
#define MASK_EXPLICIT_RELOCS (1U << 1)
#define MASK_FDIV (1U << 2)
#define MASK_SAVE_RESTORE (1U << 3)
#define MASK_STRICT_ALIGN (1U << 4)
#define MASK_64BIT (1U << 5)
#define MASK_ATOMIC (1U << 6)
#define MASK_BITMANIP (1U << 7)
#define MASK_DOUBLE_FLOAT (1U << 8)
#define MASK_HARD_FLOAT (1U << 9)
#define MASK_MUL (1U << 10)
#define MASK_RVC (1U << 11)
#define MASK_RVE (1U << 12)
...

But, I don't want the user to be able to pass "-mbmi-zbb" or
"-mno-bmi-zbb" on the command line: I only want the generation of the
`x_riscv_bitmanip_flags` variable, and the associated bitmasks so that
I can use them elsewhere in the backend code. So, I remove the name
and description from the entry, like so:

...
Target Mask(BITMANIP_ZBB) Var(riscv_bitmanip_flags)
...

But now, in the build/gcc/options.h file, the bitmask becomes relative
to the generic `x_target_flags` variable:

#define OPTION_MASK_BITMANIP_ZBC (HOST_WIDE_INT_1U << 0)
#define OPTION_MASK_BITMANIP_ZBE (HOST_WIDE_INT_1U << 1)
#define OPTION_MASK_BITMANIP_ZBF (HOST_WIDE_INT_1U << 2)
#define OPTION_MASK_BITMANIP_ZBM (HOST_WIDE_INT_1U << 3)
#define OPTION_MASK_BITMANIP_ZBP (HOST_WIDE_INT_1U << 4)
#define OPTION_MASK_BITMANIP_ZBR (HOST_WIDE_INT_1U << 5)
#define OPTION_MASK_BITMANIP_ZBS (HOST_WIDE_INT_1U << 6)
#define OPTION_MASK_BITMANIP_ZBT (HOST_WIDE_INT_1U << 7)
#define MASK_DIV (1U << 0)
#define MASK_EXPLICIT_RELOCS (1U << 1)
#define MASK_FDIV (1U << 2)
#define MASK_SAVE_RESTORE (1U << 3)
#define MASK_STRICT_ALIGN (1U << 4)
#define MASK_64BIT (1U << 5)
#define MASK_ATOMIC (1U << 6)
#define MASK_BITMANIP (1U << 7)
#define MASK_DOUBLE_FLOAT (1U << 8)
#define MASK_HARD_FLOAT (1U << 9)
#define MASK_MUL (1U << 10)
#define MASK_RVC (1U << 11)
#define MASK_RVE (1U << 12)
#define MASK_BITMANIP_ZBB (1U << 13) // 

Could someone suggest as to a way to get around this problem in the .opt file?

Best Regards,
Maxim


Re: [EXT] Re: Can LTO minor version be updated in backward compatible way ?

2019-07-22 Thread Florian Weimer
* Richard Biener:

> On Fri, Jul 19, 2019 at 10:30 AM Florian Weimer  wrote:
>>
>> * Romain Geissler:
>>
>> > That may fly in the open source world, however I expect some vendors
>> > shipping proprietary code might be fine with assembly/LTO
>> > representation of their product, but not source.
>>
>> They can't ship LTO today anyway due to the format incompatibility, so
>> that's not really an argument against source-based LTO.
>
> Source-based LTO doesn't really work unless you can re-synthesize
> source from the IL.  At least I don't see how you can do whole-program
> analysis on source and then cut it into appropriate pieces, duplicating
> some things and some not to make up for the parallel final compile step.

Oh, I meant using source code only as a portable serialization of the
program, instead of serializing unstable, compiler-specific IR.  If the
whole program does not fit into memory, the compiler will still have to
maintain on-disk data structures, but at least there wouldn't a
compatibility aspect to those anymore.

Thanks,
Florian


Re: [EXT] Re: Can LTO minor version be updated in backward compatible way ?

2019-07-22 Thread Richard Biener
On Mon, Jul 22, 2019 at 1:15 PM Florian Weimer  wrote:
>
> * Richard Biener:
>
> > On Fri, Jul 19, 2019 at 10:30 AM Florian Weimer  wrote:
> >>
> >> * Romain Geissler:
> >>
> >> > That may fly in the open source world, however I expect some vendors
> >> > shipping proprietary code might be fine with assembly/LTO
> >> > representation of their product, but not source.
> >>
> >> They can't ship LTO today anyway due to the format incompatibility, so
> >> that's not really an argument against source-based LTO.
> >
> > Source-based LTO doesn't really work unless you can re-synthesize
> > source from the IL.  At least I don't see how you can do whole-program
> > analysis on source and then cut it into appropriate pieces, duplicating
> > some things and some not to make up for the parallel final compile step.
>
> Oh, I meant using source code only as a portable serialization of the
> program, instead of serializing unstable, compiler-specific IR.  If the
> whole program does not fit into memory, the compiler will still have to
> maintain on-disk data structures, but at least there wouldn't a
> compatibility aspect to those anymore.

OK, but then we'd need to re-do the compile and IPA analysis stage
at each link with the appropriate frontend.  But sure, that would be
possible.

Richard.

> Thanks,
> Florian


Re: Can LTO minor version be updated in backward compatible way ?

2019-07-22 Thread Martin Liška
On 7/17/19 8:10 PM, Jeff Law wrote:
> On 7/17/19 11:29 AM, Andi Kleen wrote:
>> Romain Geissler  writes:
>>>
>>> I have no idea of the LTO format and if indeed it can easily be updated
>>> in a backward compatible way. But I would say it would be nice if it
>>> could, and would allow adoption for projects spread on many teams
>>> depending on each others and unable to re-build everything at each
>>> toolchain update.
>>
>> Right now any change to an compiler option breaks the LTO format
>> in subtle ways. In fact even the minor changes that are currently
>> done are not frequent enough to catch all such cases.
>>
>> So it's unlikely to really work.
> Right and stable LTO bytecode really isn't on the radar at this time.
> 
> IMHO it's more important right now to start pushing LTO into the
> mainstream for the binaries shipped by the vendors (and stripping the
> LTO bits out of any static libraries/.o's shipped by the vendors).
> 
> 
> SuSE's announcement today is quite ironic. 

Why and what is ironic about it?

> Red Hat's toolchain team is
> planning to propose switching to LTO by default for Fedora 32 and were
> working through various details yesterday.

Great!

>  Our proposal will almost
> certainly include stripping out the LTO bits from .o's and any static
> libraries.

Yes, we do it as well for now.

Martin

> 
> Jeff
> 



Re: Can LTO minor version be updated in backward compatible way ?

2019-07-22 Thread Jeff Law
On 7/22/19 8:25 AM, Martin Liška wrote:
> On 7/17/19 8:10 PM, Jeff Law wrote:
>> On 7/17/19 11:29 AM, Andi Kleen wrote:
>>> Romain Geissler  writes:

 I have no idea of the LTO format and if indeed it can easily be updated
 in a backward compatible way. But I would say it would be nice if it
 could, and would allow adoption for projects spread on many teams
 depending on each others and unable to re-build everything at each
 toolchain update.
>>>
>>> Right now any change to an compiler option breaks the LTO format
>>> in subtle ways. In fact even the minor changes that are currently
>>> done are not frequent enough to catch all such cases.
>>>
>>> So it's unlikely to really work.
>> Right and stable LTO bytecode really isn't on the radar at this time.
>>
>> IMHO it's more important right now to start pushing LTO into the
>> mainstream for the binaries shipped by the vendors (and stripping the
>> LTO bits out of any static libraries/.o's shipped by the vendors).
>>
>>
>> SuSE's announcement today is quite ironic. 
> 
> Why and what is ironic about it?
Sorry, you'd have to have internal context -- we'd been discussing it
within the Red Hat team for Fedora 32 the previous day.  One of the
questions that came up was whether or not any other major distributor
was shipping with LTO enabled :-)


Jeff


Re: [RFC] Disabling ICF for interrupt functions

2019-07-22 Thread Jozef Lawrynowicz
Hi,

On Fri, 19 Jul 2019 16:32:21 +0300 (MSK)
Alexander Monakov  wrote:

> On Fri, 19 Jul 2019, Jozef Lawrynowicz wrote:
> 
> > For MSP430, the folding of identical functions marked with the "interrupt"
> > attribute by -fipa-icf-functions results in wrong code being generated.
> > Interrupts have different calling conventions than regular functions, so
> > inserting a CALL from one identical interrupt to another is not correct and
> > will result in stack corruption.  
> 
> But ICF by creating an alias would be fine, correct?  As I understand, the
> real issue here is that gcc does not know how to correctly emit a call to
> "interrupt" functions (because they have unusual ABI and exist basically to
> have their address stored somewhere).

Yes I presume in most cases an alias would be ok. It's just that users
sometimes do funky things with interrupt functions to achieve the best possible
performance for their programs, so I wouldn't want to rule out that identical
interrupts may need distinct addresses in some situations. I cannot think of a
use case for that right now though.

So having the option to disable it somehow would be desirable.

> 
> So I think the solution shouldn't be in disabling ICF altogether, but rather
> in adding a way to recognize that a function has quasi-unknown ABI and thus
> not directly callable (so any other optimization can see that it may not emit
> a call to this function), then teaching ICF to check that when deciding to
> fold by creating a wrapper.

I agree, this is a nice suggestion. "call" instructions should be not be
allowed to be generated at all for MSP430 (and whichever other targets)
interrupt functions. Whether that be coming from the user explicitly calling the
interrupt from their code, or GCC generating the call.

This would have to be caught at the point that an optimization pass
first considers inserting a CALL to the interrupt, i.e., if the machine
description tries to prevent the generation of a call to an interrupt function
once the RTL has been generated (e.g. by blanking on the define_expand for
"call"), we are going to have ICEs/wrong code generated a lot of the time.
Particularly in the case originally mentioned here - there would be an empty
interrupt function.

> 
> (would it be possible to tell ICF that addresses of interrupt functions are
> not significant so it can fold them by creating aliases?)

I'll take a look.

Thanks,
Jozef


> 
> Alexander



Re: Can LTO minor version be updated in backward compatible way ?

2019-07-22 Thread Jeffrey Walton
On Wed, Jul 17, 2019 at 2:10 PM Jeff Law  wrote:
>
> ...
> SuSE's announcement today is quite ironic.  Red Hat's toolchain team is
> planning to propose switching to LTO by default for Fedora 32 and were
> working through various details yesterday.  Our proposal will almost
> certainly include stripping out the LTO bits from .o's and any static
> libraries.

Be sure to include an ARMv7 test case where on source file uses a the
default arch flags, and one source file uses -march=armv7-a
-mfpu=neon. (With runtime feature checking):

for example:

a.cpp - default flags
b.cpp - -march=armv7-a -mfpu=neon

We can't seem to get around errors like this during link driven through GCC:

[  303s] /usr/lib/gcc/armv7hl-suse-linux-gnueabi/9/include/arm_neon.h:4835:48:
fatal error: You must enable NEON instructions (e.g.
'-mfloat-abi=softfp' '-mfpu=neon') to use these intrinsics.
[  303s]  4835 |   return (uint32x4_t)__builtin_neon_vshl_nv4si
((int32x4_t) __a, __b);
[  303s]   |^
[  303s] compilation terminated.

The only thing we have found to sidestep the problem is, disable LTO for ARM.

Jeff


Re: [RFC] Disabling ICF for interrupt functions

2019-07-22 Thread Alexander Monakov
On Mon, 22 Jul 2019, Jozef Lawrynowicz wrote:

> This would have to be caught at the point that an optimization pass
> first considers inserting a CALL to the interrupt, i.e., if the machine
> description tries to prevent the generation of a call to an interrupt function
> once the RTL has been generated (e.g. by blanking on the define_expand for
> "call"), we are going to have ICEs/wrong code generated a lot of the time.
> Particularly in the case originally mentioned here - there would be an empty
> interrupt function.

Yeah, I imagine it would need to be a new target hook direct_call_allowed_p
receiving a function decl, or something like that.

> > (would it be possible to tell ICF that addresses of interrupt functions are
> > not significant so it can fold them by creating aliases?)
> 
> I'll take a look.

Sorry, I didn't say explicitly, but that was meant more as a remark to IPA
maintainers: currently in GCC "address taken" implies "address significant",
so "address not significant" would have to be a new attribute, or a new decl
bit (maybe preferable for languages where function addresses are not significant
by default).

Alexander


Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime

2019-07-22 Thread Jakub Jelinek
On Sun, Jul 21, 2019 at 04:46:33PM +0900, 김규래 wrote:
> About the snippet below, 
>  
>   if (gomp_barrier_last_thread (state)) 
> {
>   if (team->task_count == 0) 
> {
>   gomp_team_barrier_done (&team->barrier, state);
>   gomp_mutex_unlock (&team->task_lock);
>   gomp_team_barrier_wake (&team->barrier, 0);
>   return;
> }
>   gomp_team_barrier_set_waiting_for_tasks (&team->barrier);
> }
> 
> Am I safe to assume that gomp_barrier_last_thread is thread-safe?

Yes, you can look up the definition.
gomp_ barrier_last_thread is just a bit in the state bitmask passed to the
routine, it is set on the last thread that encounters the barrier, which is
figured out by doing atomic subtraction from the counter.

Jakub


Re: gcc/config/arch/arch.opt: Option mask gen problem

2019-07-22 Thread Jim Wilson
On Mon, Jul 22, 2019 at 4:05 AM Maxim Blinov  wrote:
> Is it possible, in the arch.opt file, to have GCC generate a bitmask
> relative to a user-defined variable without an associated name? To
> illustrate my problem, consider the following option file snippet:
> ...
> But, I don't want the user to be able to pass "-mbmi-zbb" or
> "-mno-bmi-zbb" on the command line:

If you don't want an option, why are you making changes to the
riscv.opt file?  This is specifically for supporting command line
options.

Adding a variable here does mean that it will automatically be saved
and restored, and I can see the advantage of doing that, even if it is
only indirectly tied to options.  You could add a variable here, and
then manually define the bitmasks yourself in riscv-opt.h or riscv.h.
Or you could just add the variable to the machine_function struct in
riscv.c, which will also automatically save and restore the variable.

Jim


Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime

2019-07-22 Thread 김규래
> Yes, you can look up the definition.
> gomp_ barrier_last_thread is just a bit in the state bitmask passed to the
> routine, it is set on the last thread that encounters the barrier, which is
> figured out by doing atomic subtraction from the counter.

I saw the implementation, just wanted to be sure that's the general case.
Thanks.
 
Ray Kim
 
-Original Message-
From: "Jakub Jelinek"
To: "김규래";
Cc: ;
Sent: 2019-07-23 (화) 03:54:13 (GMT+09:00)
Subject: Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
 
On Sun, Jul 21, 2019 at 04:46:33PM +0900, 김규래 wrote:
> About the snippet below,
>  
>   if (gomp_barrier_last_thread (state))
> {
>   if (team->task_count == 0)
> {
>   gomp_team_barrier_done (&team->barrier, state);
>   gomp_mutex_unlock (&team->task_lock);
>   gomp_team_barrier_wake (&team->barrier, 0);
>   return;
> }
>   gomp_team_barrier_set_waiting_for_tasks (&team->barrier);
> }
>
> Am I safe to assume that gomp_barrier_last_thread is thread-safe?

Yes, you can look up the definition.
gomp_ barrier_last_thread is just a bit in the state bitmask passed to the
routine, it is set on the last thread that encounters the barrier, which is
figured out by doing atomic subtraction from the counter.

Jakub


flow control statement

2019-07-22 Thread Ali MURAT

from which header file are they(for, while, loops) called?

can you tell me where they(like the for or while loops) are stored? like 
a iostream!


I wonder their source code. and I'll look(investigate ) at them and try 
to write a new header.