[RFC] Whole Program Devirtualization

2021-08-20 Thread Feng Xue OS via Gcc
1. Background

C++ Devirtualization is meant to determine all possible class type
identities of a virtual dynamic call, and try to transform it to direct
static call, or certain kind of statements with lower runtime cost.
Current implementation in GCC tends to adopt a somewhat conservative
scheme, originated from a safe but defensive assumption that class
hierarchy of a type (except for FINAL class or class in anonymous
namespace) is incomplete, and might be extended by external library and
module. Therefore, most of qualified virtual calls would be partially
devirtualized to a speculative form of direct call as the following,
guarded by a function address check and a failsafe branch.

if (obj->vfn_ptr == &T::FN)
obj->T::FN();
else
obj->vfn_ptr();

Here, full devirtualization is required if we want to get vcalls
concisely transformed with no guard test. In theory, when whole program
compilation is enforced, we could enable such kind of devirtualization
based on the closed-world assumption about contained class types. But
"whole program" does not ensure that class hierarchy of a type never
span to dependent C++ libraries (one is libstdc++), which would result
in incorrect devirtualization. An example is given below to demonstrate
the problem. 

// Has been pre-compiled to a library
class Base {
virtual void method() = 0;
};

class Derive_in_Library : public Base {
virtual void method()  { ... }
};

Base *get_object() 
{
return new Derive_in_Library();
}

---
  
// User source code to compile
class Base {
virtual void method() = 0;
};

class Derive : public Base {
virtual void method()  { ... }
};

void foo()
{
  Base *obj = get_object();
 
  obj->method();
}

If there is no way to inspect entire class hierarchy comprised by
relevant types in the library and user source code, whole program
analysis would improperly think of 'Derive' as sole descendant of
'Base', and triggers devirtualizing 'obj->method' to 'Derive::method',
which is definitely unexpected transformation. To target this
complication, we should figure out those types that are referenced by
regular object file and library, and exclude them from candidates of
full devirtualization. This RFC will discuss some implementable
approaches to identify whether type is qualified for full
devirtualization under whole program assumption (so also called whole-
program devirtualization, WPD for short), and new optimizations
inspired by whole program type analysis.


2. Whole-program local type

In this RFC, we merely care about polymorphic class (class contains
virtual function), since only this kind of type is meaningful to
devirtualization. A class type is identified as local in terms of
whole-program scope iff there is no instantiation of the type or its
derived types in regular object files and libraries to link with.
A whole-program local type is safe to apply WPD. Here are two
mechanisms for us to perform this check on a given type: 

2.1 Typeinfo symbol resolution by linker-plugin

Essentially, devirtualization is driven by analysis on vtable lifetime,
which begins once the vtable is attached to a new complete object or
sub-object during instantiation of a class or its derived class.

Seemingly, we could find whether a class type is referenced in object
file or library by tracking availability of its vtable symbol. But
vtable might be purged and we are still interested in its belonging
class type. Refer to the library code in above example, vtable of
'Base' is unused since it neither participate construction of 'Base'
nor 'Derive_in_Library', but we still must know if 'Base' is live
in the library to ensure correct result.

Moreover, once a class is virtual inherited, it will have multiple
vtables (including construction vtables), but the way of looking up
class via vtable symbol requires one-to-one correspondence between
them, then it does not work.

Beside vtable symbol, class instantiation also creates references to
typeinfo symbols of itself and all its parent classes. At the same time,
each class has unique typeinfo, which could act as a desirable type
marker, and be used to perform type lookup in object file and library.
Anyway, the approach is not 100% safe, specifically, when typeinfo
symbols are invisible or missed, for example, when libraries to link
against was built with -fno-weak or -fno-rtti. But at least for
libstc++, these symbols should be present for the sake of allowing
dynamic_cast in user source code.

Lto-linker-plugin will work with linker to do a whole-program symbol
resolution before LTO/WPA happens, and attach binding information to
symbols. Given a typeinfo symbol, if it is resolved as
LDPR_PREVAILING_DEF_IRONLY (only referenced from IR code, with no
references from regular objects), its corresponding class type is
deemed to be whole-pro

GCC [RFC] Whole Program Devirtualization

2021-08-20 Thread Basile Starynkevitch

Hello Feng Xue OS


Your project is interesting, but ambitious.

I think the major points are:

*whole program analysis*. Static analysis tools like 
https://frama-c.com/  or 
https://github.com/bstarynk/bismon/ 
 could be relevant. Projects like 
https://www.decoder-project.eu/  could 
be relevant. With cross-compilation, things are becoming harder.


*abstract interpretation* might be relevant (but difficult and costly to 
implement). See wikipedia.


*size of the whole program which is analyzed*.  If the entire program 
(including system libraries like libc) has e.g. less than ten thousand 
routines and less than a million GIMPLE instructions in total, it make 
sense. But if the entire program is as large as the Linux kernel, or the 
GCC compiler, or the Firefox browser (all have many millions lines of 
source code) you probably won't be able to do whole program 
devirtualization in a few years of human work.



*computed gotos* or *labels as values* (see 
https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html 
 for more) are 
making this difficult. But they do exist, and probably could be hidden 
in GNU glibc or libstdc++ internal code.


*asm**statements are difficult*. They usually appear inside your libc. 
How would you deal with them?


*Can you afford a month of computer time to compile a large software* 
with your whole program devirtualizer? In most cases, not, but Pitrat's 
book /Artificial Beings - the conscience of a conscious machine/ (ISBN 
9781848211018) suggest cases where it might make sense (he is explaining 
a "compiler like system" which runs for a month of CPU time).


My recommendation would be to *code first a simple GCC plugin as a proof 
of concept thing*, which reject programs which could not be 
realistically devirtualized, and store somewhere (in some database 
perhaps) a representation of them otherwise. I worked 3 years full time 
on https://github.com/bstarynk/bismon/ 
 to achieve a similar goal (and I 
don't claim to have succeeded, and I don't have any more funding). My 
guess is that some code could be useful to you (then contact me by email 
both at work basile.starynkevi...@cea.fr and at home 
bas...@starynkevitch.net )


The most important thing: limit your ambition at first. Write a document 
(at least an internal one) stating what you won't do.



Cheers

--
Basile Starynkevitch  
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/



Re: [RFC] Whole Program Devirtualization

2021-08-20 Thread Martin Liška

... adding Honza to CC who spent quite some time on devirtualization pass.

Martin


Re: [PATCH] Try LTO partial linking. (Was: Speed of compiling gimple-match.c)

2021-08-20 Thread Martin Liška

On 8/16/21 3:58 PM, Martin Liška wrote:

PING^2

@Honza: Can you please review the change?


I've tested the patch and apparently it's not enough for 
{gimple,generic}-match.o not clashing
in symbol names. Apparently there are more IPA clones that collide.

Leaving that for now.

Martin



Martin

On 6/23/21 3:53 PM, Martin Liška wrote:

On 5/21/21 10:29 AM, Martin Liška wrote:

On 5/20/21 5:55 PM, Jan Hubicka wrote:

Quick solution is to also modify partitioner to use the local symbol
names when doing incremental linking (those mixing in source code and
random seeds) to avoid clashes.


Good hint. I added hash based on object file name (I don't want to handle
proper string escaping) and -frandom-seed.

What do you think about the patch?
Thanks,
Martin


@Honza: Can you please take a look at this patch?

Cheers,
Martin






Re: [PATCH] Try LTO partial linking. (Was: Speed of compiling gimple-match.c)

2021-08-20 Thread Martin Liška

On 6/1/21 3:19 PM, Richard Biener wrote:

On Tue, Jun 1, 2021 at 1:25 PM Martin Liška  wrote:


On 6/1/21 9:42 AM, Richard Biener wrote:

On Tue, Jun 1, 2021 at 9:33 AM Martin Liška  wrote:


@Richi: Can you please reply to this email?


Not sure what I should add here?  Honza suggested to mangle the
promoted symbol names.


Sure and I sent a patch for that.


I don't
really like the idea to compile multiple TUs into one object.  Also


What's problematic is that we'll have to wait for one another release to make 
it useful
(if you don't want to build the current master with a snapshot compiler).


IMHO it's a bugfix.  Note that I'm not sure what the intent of the change is.
If it is to speedup bootstrap then using LTO bootstrap would do the trick
as well (and better) if we'd simply process all of libbackend.a this way
(and thus avoid re-linking that once for each frontend).


When building a GCC package, we intentionally re-link them with all FEs.


If it is to speedup
dev (re-)builds then dragging in more files will make it build longer since
for example insn-recog.c may be unchanged but gimple-match.c not.


Yes, the original motivation was a speed up of a dev. build and yes, the shown
example is problematic. Right now, I'm leaving that as I'm not interested enough
in the parallel build of a simple source file.

Martin




+LTO_LINKER_FLAGS = -flto=auto --param=lto-partitions=16
-flinker-output=nolto-rel -r

why hard-code to 16 partitions?  You're side-stepping the driver
diagnostic by doing
compile & link separately, but in the end we're going to want sth like Giulianos
-fparallel-compile that works transparently from within the driver, so
the "manual"
operation should try to follow that or alternatively a driver-only
wrapper around the
"manual" processing could be added whose implementation can be optimized later.


All right. Do you want me refreshing his -fparallel-compile option introduction?


I'm not sure if we've arrived at mergeable state - but if it's
reasonably possible
to hide s/-fparallel-compile/-flto -r -flinker-output=nolto-rel/ split
into compile & link
parts (avoiding the diagnostic on -flinker-output) in the driver then
I think that's
a very reasonable first step (after fixing the symbol privatization issue).  The
GSOC project then was to elide the IL streaming from the high-level operation.

Richard,



Why do you use -flto=auto?  There should be a jobserver active.


Yes, that should not be needed.

Martin




On 5/21/21 10:43 AM, Martin Liška wrote:

On 5/20/21 2:54 PM, Richard Biener wrote:

On Thu, May 20, 2021 at 2:34 PM Martin Liška  wrote:


Hello.

I've got a patch candidate that leverages partial linking for a couple of 
selected object files.

I'm sending make all-host- jX results for my machine:

before: 3m18s (user 32m52s)
https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/1dd5eae5001295ba0230a689f7edc67284c9b742/gcc-all-host.svg

after: 2m57m (user 35m)
https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/d659b2187cf622167841efbbe6bc93cb33855fa9/gcc-all-host-partial-lto.svg

One can utilize it with:
make -j16 all-host PARTIAL_LTO=1

@Segher, Andrew: Can you please measure time improvement for your slow 
bootstrap?
One can also tweak --param=lto-partitions=16 param value.

Thoughts?


You're LTO linking multiple objects here - that's almost as if you
were doing this
for the whole of libbackend.a ... so $(OBJS)_CLFAGS += -flto and in the
libbackend.a rule do a similar partial link trick.


Yeah, apart from that one can't likely do partial linking for an archive:

$ g++ -no-pie -flto=auto --param=lto-partitions=16 -flinker-output=nolto-rel -r 
libbackend.a
collect2: fatal error: ld terminated with signal 11 [Segmentation fault], core 
dumped
compilation terminated.

while ld.bfd immediately finishes.



That gets you half of a LTO bootstrap then.

So why did you go from applying this per-file to multiple files?  Does $(LINKER)
have a proper rule to pick up a jobserver?

When upstreaming in any form you probably have to gate it on bootstrap-lto
being not active.


Sure, that's reasonable, we can likely detect a -flto option in $(COMPILE), 
right?

One more thing I face is broken dependency:
$ make clean && make -j32 PARTIAL_LTO=1

g++ -fcf-protection -fno-PIE -c   -g   -DIN_GCC -fPIC-fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. 
-I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. 
-I/home/marxin/Programming/gcc/gcc/../include 
-I/home/marxin/Programming/gcc/gcc/../libcpp/include 
-I/home/marxin/Programming/gcc/gcc/../libcody  
-I/home/marxin/Programming/gcc/gcc/../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumbe

GNU Tools @ LPC 2021: 10 days left to submit

2021-08-20 Thread Jeremy Bennett
Hi all,

A gentle reminder that the deadline for the GNU Tools track at Linux
Plumbers Conference 2021 is 31 August. We are using the LPC submission
system to keep things simple this year. Unlike Cauldron, we won't be
able to take late submissions, since we have to tie in with the LPC 2021
infrastructure.

A reminder that the conference itself is virtual and takes place 20-24
September. You will need to register and buy a ticket ($50) if you wish
to attend.

The scope is the same as Cauldron, with talks, lightning talks,
tutorials and BoF sessions across all GNU tools: GlibC, GDB, binutils,
CGEN, DejaGnu, Newlib, GCC etc. We'll be using BigBlueButton as the
platform, and to try to fit in as many talks as possible, we'll be
restricting sessions to 25 minutes (10 minutes for lightning talks).

Full details of the conference and how to submit on the Wiki:

  https://gcc.gnu.org/wiki/linuxplumbers2021

We look forward to seeing you (virtually) there. Any questions to
tools-cauldron-ad...@googlegroups.com, discussion and future
announcements to gcc@gcc.gnu.org.

Best wishes,


Jeremy Bennett
on behalf of the organizing committee
-- 
Cell: +44 (7970) 676050
SkypeID: jeremybennett
Twitter: @jeremypbennett
Email: jeremy.benn...@embecosm.com
Web: www.embecosm.com
PGP key: 1024D/BEF58172FB4754E1 2009-03-20



OpenPGP_signature
Description: OpenPGP digital signature


gcc-10-20210820 is now available

2021-08-20 Thread GCC Administrator via Gcc
Snapshot gcc-10-20210820 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20210820/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-10 revision 3e1daf46f483440327ef607b3eeed339b0c41dba

You'll find:

 gcc-10-20210820.tar.xz   Complete GCC

  SHA256=1af36c0820c0121323dab90c41e5030b8d7643bd410723d56bfeedcd31c81937
  SHA1=c1cefbcbe92ddb8d1c953b32ec1fc7ebe5544a3d

Diffs from 10-20210813 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.