_Dependent_ptr problems in RTL passes

2019-07-17 Thread Akshat Garg
Hi all,

We are working on making memory_order_consume not get promoted to
memory_order_acquire. Here is a little background on the work we are doing
https://gcc.gnu.org/ml/gcc/2019-07/msg00038.html

We are able to parse _Dependent_ptr from C front-end. The patch files are
given here.
https://github.com/AKG001/gcc/commit/2accdd2b43100abae937c714eb4c8e385940b5c7
https://github.com/AKG001/gcc/commit/fb4187bc3872a50880159232cf336f0a03505fa8

Currently, we are working over the pointers only.
As discussed earlier,  there are certain passes, on the tree and RTL level,
that may break the dependencies specified by the user. We are interested to
know about the problems that could arise during the RTL passes. For that,
we have tried to skip the tree passes, by considering the _Dependent_ptr as
volatile. The patch for this here.
https://github.com/AKG001/gcc/commit/e4ffd77f62ace986c124a70b90a662c083c570ba

We are trying to find all the passes where the dependencies could get
broken. We have experimented on certain examples, and we have some doubts
regarding an example. Hoping community people can help us.  The example is
figure 20 from here (
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf).

The example:  (https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c).

The .optimized code:
https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.231t.optimized
The .expand code:
https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.233r.expand
The .cse1 code:
https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1
The .final code:
https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.317r.final

In the .expand code, I believe there are no dependencies that gets broken.
Hoping someone could also verify. But, in .cse1 code instruction at line at
line 231, as shown below,

1.  (insn 10 9 11 2 (set (reg:CCZ 17 flags)
2.   (compare:CCZ (reg/f:DI 84 [ p.2_3 ])
3.  (const_int 0 [0]))) "p0190r4_fig20.c":44:6 8  {*cmpdi_ccno_1}
4.  (nil))

and instruction at line 402, as shown below
5.  (insn 10 9 11 2 (set (reg:CCZ 17 flags)
6.  (compare:CCZ (reg:DI 82 [ _1 ])
7.(const_int 0 [0]))) "p0190r4_fig20.c":44:6 8 {*cmpdi_ccno_1}
8.  (expr_list:REG_DEAD (reg/f:DI 84 [ p.2_3 ])
9.  (nil)))
dependencies get broken.

The instruction starting at line 1 gets changed to instruction starting at
line 5 and starts referring to variable _1 which is defined as " long
unsigned int _1;" in the .optimized code in thread1(), which is a temporary
variable. I believe, this breaks the dependencies specified by the user and
for that we need to put some code inside cse.c file.

Also, many versions for variable 'p' gets created shown in .optimized code,
they all should have _dependent_ptr qualification which they don't have
currently. I believe, simply bypassing the tree passes through volatile
checks won't mark them as dependent pointer qualified. For this, I believe,
we need to tweak the ssa generation pass(tree-ssa.c) somewhat.

Thank you all and let me know if anyone finds me wrong anywhere.
-Akshat


Re: Doubts regarding the _Dependent_ptr keyword

2019-07-17 Thread Akshat Garg
On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill  wrote:

> On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney 
> wrote:
> >
> > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote:
> > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg  wrote:
> > >
> > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan <
> > > > ramana@googlemail.com> wrote:
> > > >
> > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg 
> wrote:
> > > >> >
> > > >> > As we have some working front-end code for _Dependent_ptr, What
> should
> > > >> we do next? What I understand, we can start adding the library for
> > > >> dependent_ptr and its functions for C corresponding to the ones we
> created
> > > >> as C++ template library. Then, after that, we can move on to
> generating the
> > > >> assembly code part.
> > > >> >
> > > >>
> > > >>
> > > >> I think the next step is figuring out how to model the Dependent
> > > >> pointer information in the IR and figuring out what optimizations to
> > > >> allow or not with that information. At this point , I suspect we
> need
> > > >> a plan on record and have the conversation upstream on the lists.
> > > >>
> > > >> I think we need to put down a plan on record.
> > > >>
> > > >> Ramana
> > > >
> > > > [CCing gcc mailing list]
> > > >
> > > > So, shall I start looking over the pointer optimizations only and
> see what
> > > > information we may be needed on the same examples in the IR itself?
> > > >
> > > > - Akshat
> > > >
> > > I have coded an example where equality comparison kills dependency
> from the
> > > document P0190R4 as shown below :
> > >
> > > 1. struct rcutest rt = {1, 2, 3};
> > > 2. void thread0 ()
> > > 3. {
> > > 4. rt.a = -42;
> > > 5. rt.b = -43;
> > > 6. rt.c = -44;
> > > 7. rcu_assign_pointer(gp, &rt);
> > > 8. }
> > > 9.
> > > 10. void thread1 ()
> > > 11. {
> > > 12.int i = -1;
> > > 13.int j = -1;
> > > 14._Dependent_ptr struct rcutest *p;
> > > 15.
> > > 16.p = rcu_dereference(gp);
> > > 17.j = p->a;
> > > 18.   if (p == &rt)
> > > 19.i = p->b;  /*Dependency breaking point*/
> > > 20.   else if(p)
> > > 21.   i = p->c;
> > > 22.   assert(i<0);
> > > 23.   assert(j<0);
> > > 24. }
> > > The gimple unoptimized code produced for lines 17-24 is shown below
> > >
> > > 1. if (p_16 == &rt)
> > > 2. goto ; [INV]
> > > 3.   else
> > > 4.goto ; [INV]
> > > 5.
> > > 6.   :
> > > 7.  i_19 = p_16->b;
> > > 8.  goto ; [INV]
> > > 9.
> > > 10.   :
> > > 11.  if (p_16 != 0B)
> > > 12.goto ; [INV]
> > > 13.  else
> > > 14.goto ; [INV]
> > > 15.
> > > 16.   :
> > > 17.  i_18 = p_16->c;
> > > 18.
> > > 19.   :
> > > 20.  # i_7 = PHI 
> > > 21.  _3 = i_7 < 0;
> > > 22.  _4 = (int) _3;
> > > 23.  assert (_4);
> > > 24.  _5 = j_17 < 0;
> > > 25.  _6 = (int) _5;
> > > 26.  assert (_6);
> > > 27.  return;
> > >
> > > The optimized code after -O1 is applied for the same lines is hown
> below :
> > >
> > > 1. if (_2 == &rt)
> > > 2.goto ; [30.00%]
> > > 3. else
> > > 4.goto ; [70.00%]
> > > 5.
> > > 6.   [local count: 322122547]:
> > > 7.   i_12 = rt.b;
> > > 8.   goto ; [100.00%]
> > > 9.
> > > 10.   [local count: 751619277]:
> > > 11.   if (_1 != 0)
> > > 12.   goto ; [50.00%]
> > > 13.   else
> > > 14.goto ; [50.00%]
> > > 15.
> > > 16.   [local count: 375809638]:
> > > 17.   i_11 = MEM[(dependent_ptr struct rcutest *)_2].c;
> > > 18.
> > > 19.[local count: 1073741824]:
> > > 20.  # i_7 = PHI 
> > > 21.   _3 = i_7 < 0;
> > > 22.   _4 = (int) _3;
> > > 23.   assert (_4);
> > > 24.  _5 = j_10 < 0;
> > > 25.  _6 = (int) _5;
> > > 26.   assert (_6);
> > > 27.   return;
> >
> > Good show on tracing this through!
> >
> > > Statement 19 in the program gets converted from  i_19 = p_16->b; in
> line 7
> > > in unoptimized code to i_12 = rt.b; in line 7 in optimized code which
> > > breaks the dependency chain. We need to figure out the pass that does
> that
> > > and put some handling code in there for the _dependent_ptr qualified
> > > pointers. Passing simply -fipa-pure-const, -fguess-branch-probability
> or
> > > any other option alone does not produce the optimized code that breaks
> the
> > > dependency. But applying -O1, i.e., allowing all the optimizations
> does so.
> > > As passes are applied in a certain order, we need to figure out up to
> what
> > > passes, the code remains same and after what pass the dependency does
> not
> > > holds. So, we need to check the translated code after every pass.
> > >
> > > Does this sounds like a workable plan for ? Let me know your thoughts.
> If
> > > this sounds good then, we can do this for all the optimizations that
> may
> > > kill the dependencies at somepoint.
> >
> > I don't know of a better plan.
> >
> > My usual question...  Is there some way to script the checking of the
> > translated code at the end of each pass?
>
> The usual way to check the output of an optimization pass is by
> dumping the intermediate code at that point and matching the dump
> agains

Can LTO minor version be updated in backward compatible way ?

2019-07-17 Thread Romain Geissler
Hi,

SuSE (Martin) annonunced today that fromw now on SuSE Tumbleweed will
ship with LTO-built packages by default [1].

That's a good news, however I have a question wrt how you expect to
support LTO in the future. I have been enabling it in my company for
just few selected components and I run into trouble several times these
last years. In the LTO section you define both a major version and a
minor version, however changing any of them will result in LTO build to
fail if all binaries involved in the link don't strictly have the exact
same version. Recently in gcc 9 we went from version 8.0 to 8.1. In the
past in gcc 8 I recall I also hit a problem when it went from 7.0 to
7.1. In my case, it meant recompiling a set of let's say 100 open source
libraries, and around 30 different proprietary libraries (we use static
linking, that's why all libs have to be rebuilt each time we upgrade gcc
to the next minor version). This is still bearable at my level, I don't
have too many dependencies.

However at scale, I think this can become a problem. What will happen
when in gcc 9.3 we change the version to 8.2 ? Will Tumbleweed recompile
100% of the static libraris it ships ? What about all users of
Tumbleweed having their own private libs with LTO as well ? In my company,
I don't advocate LTO at scale (yet) because of this problem in particular:
re-building everything when we release a toolchain with an updated gcc
would be too complex.

I am totally fine with having the major version mismatch as a
showstopper for the link. People will usually not combine a gcc 8 built
binary with a gcc 9 one. However if we have made a distinction with
major vs minor, is it possible to adopt a backward compatible policy in
the minor version ? Let's say I have a shiny new gcc 9, it can combine
both LTO binaries of version 8.0 and 8.1. Maybe it can emit a warning
saying it will work in degraded mode, but at least allow the build to go
on.

If having format backward compatible constraints is too hard inside a
given major gcc release, may we can consider another alternative to
failure. If fat objects were used, and if really the two minor
versions are really incompatible, maybe we can fallback on the non-LTO
part for the old library and still the link will be successful (but not
as optimized as we would like too, most likely warnings will notify
about that).

I have no idea of the LTO format and if indeed it can easily be updated
in a backward compatible way. But I would say it would be nice if it
could, and would allow adoption for projects spread on many teams
depending on each others and unable to re-build everything at each
toolchain update.

Cheers,
Romain

[1] https://lists.opensuse.org/opensuse-factory/2019-07/msg00240.html


Binary Compatibility and Upgrades - GCC C++

2019-07-17 Thread anand.tod...@siemens.com
Hello GCC Team,

I am Anand Todkar working in Siemens Corporate Technology as Software 
Architect, I am involved in designing and developing various Platforms for 
Siemens PLCs and other industrial solutions.

I am currently doing "Binary Compatibility feasibility study" for one of C++ 
library which we are planning to make binary compatible with help of Bridge 
Pattern. I am looking for any roadmap for GCC Compiler and its standard 
libraries which will impact binary compatibility in future releases of compiler?

I searched on developer blog and couldn't specifically find any such update but 
want to assure if there are any roadmap items which may cause issue and if we 
will provide with provision to maintain compatibility. Any help in this regard 
regarding links or documentation will be highly appreciated.

Appreciated your response on the same.

With best regards,
Anand Todkar

Siemens Corporation
CT RDA IOT DCO-US
755 College Road East
Princeton, NJ 08540-6632, USA
Mobile: +1 609 216-5346



Re: Can LTO minor version be updated in backward compatible way ?

2019-07-17 Thread Michael Matz
Hi,

On Wed, 17 Jul 2019, Romain Geissler wrote:

> However at scale, I think this can become a problem. What will happen
> when in gcc 9.3 we change the version to 8.2 ? Will Tumbleweed recompile
> 100% of the static libraris it ships ?

Every compiler change causes the whole distro to be rebuilt.  So for us 
the LTO byte stream instability is no problem.

> What about all users of Tumbleweed having their own private libs with 
> LTO as well ?

LTO is currently not designed for this use case, you can use fat objects 
to get around the limitation, as you say, but a stable LTO byte stream is 
currently not a focus.  But with time I indeed also hope that some 
backward compatiblity can be achieved, with degrading modes like you 
suggested.

> I am totally fine with having the major version mismatch as a 
> showstopper for the link. People will usually not combine a gcc 8 built 
> binary with a gcc 9 one.

That's actually not too far off from what people will want to do in the 
future.  Say some HPC vendor ships their libs as static archives, 
containing LTO byte code compiled by gcc 9.  Then a distro user might get 
gcc 10 at some point later, and it's reasonable to expect that the HPC 
libs still are working.  We aren't there yet, but we eventually want to be 
there.


Ciao,
Michael.


Re: Can LTO minor version be updated in backward compatible way ?

2019-07-17 Thread Andi Kleen
Romain Geissler  writes:
>
> I have no idea of the LTO format and if indeed it can easily be updated
> in a backward compatible way. But I would say it would be nice if it
> could, and would allow adoption for projects spread on many teams
> depending on each others and unable to re-build everything at each
> toolchain update.

Right now any change to an compiler option breaks the LTO format
in subtle ways. In fact even the minor changes that are currently
done are not frequent enough to catch all such cases.

So it's unlikely to really work.

-Andi



Re: Can LTO minor version be updated in backward compatible way ?

2019-07-17 Thread Jeff Law
On 7/17/19 11:29 AM, Andi Kleen wrote:
> Romain Geissler  writes:
>>
>> I have no idea of the LTO format and if indeed it can easily be updated
>> in a backward compatible way. But I would say it would be nice if it
>> could, and would allow adoption for projects spread on many teams
>> depending on each others and unable to re-build everything at each
>> toolchain update.
> 
> Right now any change to an compiler option breaks the LTO format
> in subtle ways. In fact even the minor changes that are currently
> done are not frequent enough to catch all such cases.
> 
> So it's unlikely to really work.
Right and stable LTO bytecode really isn't on the radar at this time.

IMHO it's more important right now to start pushing LTO into the
mainstream for the binaries shipped by the vendors (and stripping the
LTO bits out of any static libraries/.o's shipped by the vendors).


SuSE's announcement today is quite ironic.  Red Hat's toolchain team is
planning to propose switching to LTO by default for Fedora 32 and were
working through various details yesterday.  Our proposal will almost
certainly include stripping out the LTO bits from .o's and any static
libraries.

Jeff