pcc added a comment.

For reference, here is the text of the proposal I sent to cxx-abi-dev:

Hi all,

The ABI currently requires that virtual tables for a class appear consecutively 
in a virtual table group. I would like to propose a restriction that would 
require that compilers may only access the virtual table associated with the 
address point stored in an object's virtual table pointer, and may not rely on 
any knowledge that the compiler may have about the relative layout of other 
virtual tables in the virtual table group.

The purpose of this restriction is to allow an implementation to split a 
virtual table group along virtual table boundaries.

Motivation

There are at least two scenarios which would benefit from vtable splitting: 
clients which want to place data either before or after the ABI-required part 
of a virtual table, and clients which want to control the layout of virtual 
tables for performance or security reasons.

As an example of the first scenario, when performing whole-program virtual call 
optimization, Clang will apply an optimization known as virtual constant 
propagation [0], which causes data to be laid out at a specific offset from the 
address point of each virtual table in a hierarchy. If that virtual table 
appears in a virtual table group, padding is required to place the data at an 
appropriate offset for each class. Because of the current restriction that 
vtables must appear consecutively, the optimizer may need to add more padding 
than necessary, or inhibit the optimization entirely if it would require too 
much padding.

As an example of the second scenario, an implementation may wish to lay out 
virtual tables hierarchically either in order to increase the likelihood of a 
cache hit when repeatedly making the same virtual call over a set of 
heterogeneous objects, or to efficiently implement a security mitigation 
(specifically control flow integrity [1]) based on checking virtual table 
addresses for set membership. Placing only virtual tables (rather than virtual 
table groups) consecutively would likely increase the cache hit likelihood 
further and reduces the amount of metadata required to implement set membership 
checks.

In an experiment involving the Chromium web browser, I have measured a binary 
size decrease of 1.5%, and a median performance improvement of about 1% on 
Chromium's layout benchmarks when comparing a binary compiled with control flow 
integrity and whole-program virtual call optimization against a binary compiled 
with control flow integrity, whole-program virtual call optimization and a 
prototype implementation of vtable splitting.

Commentary

Although the ABI specifies [2] the calling convention for virtual calls, which 
requires the call to be made using the this-adjustment appropriate for the 
object from which the virtual table pointer was loaded, the as-if rule could in 
principle allow a program to make a call using a different virtual table if the 
virtual table group contains multiple secondary virtual tables, as the distance 
between these virtual tables would be fixed (the same would be possible for all 
virtual tables if the dynamic type were known, but in that case the program 
could just call the appropriate virtual function directly).

The purported benefit would be to avoid an additional virtual pointer load from 
the object in cases where consecutive calls are made to virtual functions 
introduced in different bases. However, it seems to me that cases where this is 
beneficial would be rare: not only would you need at least three bases and a 
derived class which does not override any of the called virtual functions, but 
when performing two consecutive calls it seems likely that the vtable would 
need to be reloaded anyway, either from the object or from the stack, 
especially with majority caller-save ABIs such as x86-64, or in any event 
because the first virtual call may have changed the object's dynamic type. It 
seems (according to experiments [3] carried out at godbolt.org) that all major 
compilers (gcc, clang, icc) do already use the appropriate vtable group and 
therefore are compliant with the proposed restriction.

(There would also seem to be nothing preventing an implementation from choosing 
to load the RTTI pointer or offset-to-top from another virtual table group. 
However I would consider this even less likely to be beneficial than a virtual 
call via another virtual table.)

The ABI specifies that the vtables in a group shall be laid out consecutively 
when referenced via a vtable group symbol, and I'm not proposing to change 
this. The effect of this proposal would be to allow a vtable to be split if the 
vtable group symbol is not referenced directly by name outside of the 
translation unit(s) participating in the optimization. This may be the case 
when a class has internal linkage, or if the program is linked with LTO, which 
allows the compiler to know which symbols are referenced outside of the LTO'd 
part of the program.

Wording

I propose to add two paragraphs to the section of the ABI describing virtual 
table groups, as follows:

diff --git a/abi.html b/abi.html
index 79cda2c..fce0c60 100644

- a/abi.html

+++ b/abi.html
@@ -1193,6 +1193,18 @@ and again excluding primary bases
 (which share virtual tables with the classes for which they are primary).
 </ul>

+<p>
+When performing a virtual call or loading any other data from an address
+derived from the address point stored in an object's virtual table pointer,
+a program may only load from the virtual table associated with that address
+point, and not from any other virtual table in the same virtual table group
+which might be presumed to be located at a fixed offset from the address
+point as a result of the above layout algorithm.
+
+<p>
+The purpose of this restriction is to allow an implementation to split a
+virtual table group along virtual table boundaries if its symbol is not
+visible to other translation units.

<p>
 <a name="vtable-construction">

Thanks,
Peter

[0] http://lists.llvm.org/pipermail/llvm-dev/2016-January/094600.html
[1] http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
[2] https://mentorembedded.github.io/cxx-abi/abi.html#vcall.caller
[3] https://godbolt.org/g/wX7Ay6 is a three-bases test case by Richard Smith, 
https://godbolt.org/g/7eG8A1 is a dynamic-type-known test case by me


https://reviews.llvm.org/D22296



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to