================
@@ -103,27 +112,226 @@ static cl::opt<bool>
ICPDUMPAFTER("icp-dumpafter", cl::init(false), cl::Hidden,
cl::desc("Dump IR after transformation happens"));
+// Indirect call promotion pass will fall back to function-based comparison if
+// vtable-count / function-count is smaller than this threshold.
+static cl::opt<float> ICPVTablePercentageThreshold(
+ "icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+ cl::desc("The percentage threshold of vtable-count / function-count for "
+ "cost-benefit analysis. "));
+
+// Although comparing vtables can save a vtable load, we may need to compare
+// vtable pointer with multiple vtable address points due to class inheritance.
+// Comparing with multiple vtables inserts additional instructions on hot code
+// path; and doing so for earlier candidate of one icall can affect later
+// function candidate in an undesired way. We allow multiple vtable comparison
----------------
minglotus-6 wrote:
> I think what you mean is that doing so for an earlier candidate delays the
> comparisons for later candidates, but that for the last candidate, only the
> fallback path is affected?
Yes. I updated the comment.
> Do we expect to set this parameter above 1?
Yes. Setting it to 1 is to make the default parameter conservative. Based on
my tests on `-pie` or `pie` binaries , setting it to 2 gives measurable
performance win compared with 1, and setting it to 3 doesn't give stable
performance wins across different binaries or across runs.
One interesting thing is the actual cost of materializing one vtable address
point depends on compile option `fpic/fpie`, and the cost of materializing a
vtable address point and a function is comparable if `fpie/fpic` option is the
same.
* For non-pie binaries, `@vtable + address-point-offset` is lowered to an
immediate representing vtable address point. It could be folded into `icmp` IR
after lowering, something like `icmp #imm, <reg>`. For pie (but non-pic)
binaries, `@vtable + address-point-offset` is lowered to a pc-relative address.
So it takes one instruction to materialize the pc-relative address
itself(something like `leaq 2890849(%rip), %rdx # 0x30fe50
<_ZTV8Derived1>` for x86).
https://github.com/llvm/llvm-project/pull/81442
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits