[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-17 Thread Andrew Hunter via Phabricator via cfe-commits
ahh created this revision.
ahh added a reviewer: rsmith.
Herald added a subscriber: cfe-commits.

[expr.delete] is pretty crystal clear that it's OK to invoke a
deallocation-function on a nullptr delete-expression:

"If the value of the operand of the delete-expression is a null
pointer value, it is unspecified whether a deallocation function will
be called as described above."

Even so, we currently check against nullptr unconditionally. This is
wasteful for anything with a trivial destructor; deleting nullptr is
very rare so it's not worth saving the call to ::operator delete, and
this is a useless branch (and waste of code size) in the common case.

Condition emission of the branch on us needing to actually look
through the pointer for a vtable, size cookie, or nontrivial
destructor.  (In principle a nontrivial destructor with no side
effects that didn't touch the object would also be safe to run
unconditionally, but I don't know how to test we have one of those and
who in the world would write one?)

On an important and very large (~500 MiB) Google search binary, this
saves approximately 32 KiB of text. Okay, it's not impressive in a
relative sense, but it's still the right thing to do.

A note on optimization: we still successfully elide delete-expressions of
(literal) nullptr.  Where before they were stuck behind a never-taken branch,
now they reduce (effectively) to calls to __builtin_operator_delete(nullptr),
which EarlyCSE is capable of optimizing out. So this has no cost in the
already well-optimized case.


Repository:
  rC Clang

https://reviews.llvm.org/D43430

Files:
  lib/CodeGen/CGCXXABI.h
  lib/CodeGen/CGExprCXX.cpp
  test/CodeGenCXX/cxx2a-destroying-delete.cpp
  test/CodeGenCXX/delete-two-arg.cpp
  test/CodeGenCXX/delete.cpp

Index: test/CodeGenCXX/delete.cpp
===
--- test/CodeGenCXX/delete.cpp
+++ test/CodeGenCXX/delete.cpp
@@ -9,7 +9,12 @@
 };
 
 // POD types.
+// CHECK-LABEL: define void @_Z2t3P1S
 void t3(S *s) {
+  // CHECK-NOT: icmp eq {{.*}}, null
+  // CHECK-NOT: br i1
+  // CHECK: call void @_ZdlPv
+
   delete s;
 }
 
@@ -99,6 +104,8 @@
 
 namespace test3 {
   void f(int a[10][20]) {
+// CHECK-NOT: icmp eq {{.*}}, null
+// CHECK-NOT: br i1
 // CHECK: call void @_ZdaPv(i8*
 delete a;
   }
@@ -113,6 +120,8 @@
 
   // CHECK-LABEL: define void @_ZN5test421global_delete_virtualEPNS_1XE
   void global_delete_virtual(X *xp) {
+// CHECK: icmp eq {{.*}}, null
+// CHECK: br i1
 //   Load the offset-to-top from the vtable and apply it.
 //   This has to be done first because the dtor can mess it up.
 // CHECK:  [[T0:%.*]] = bitcast [[X:%.*]]* [[XP:%.*]] to i64**
Index: test/CodeGenCXX/delete-two-arg.cpp
===
--- test/CodeGenCXX/delete-two-arg.cpp
+++ test/CodeGenCXX/delete-two-arg.cpp
@@ -9,8 +9,8 @@
   // CHECK-LABEL: define void @_ZN5test11aEPNS_1AE(
   void a(A *x) {
 // CHECK:  load
-// CHECK-NEXT: icmp eq {{.*}}, null
-// CHECK-NEXT: br i1
+// CHECK-NOT: icmp eq {{.*}}, null
+// CHECK-NOT: br i1
 // CHECK:  call void @_ZN5test11AdlEPvj(i8* %{{.*}}, i32 4)
 delete x;
   }
Index: test/CodeGenCXX/cxx2a-destroying-delete.cpp
===
--- test/CodeGenCXX/cxx2a-destroying-delete.cpp
+++ test/CodeGenCXX/cxx2a-destroying-delete.cpp
@@ -15,9 +15,8 @@
 void delete_A(A *a) { delete a; }
 // CHECK-LABEL: define {{.*}}delete_A
 // CHECK: %[[a:.*]] = load
-// CHECK: icmp eq %{{.*}} %[[a]], null
-// CHECK: br i1
-//
+// CHECK-NOT: icmp eq %{{.*}} %[[a]], null
+// CHECK-NOT: br i1
 // Ensure that we call the destroying delete and not the destructor.
 // CHECK-NOT: call
 // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]])
@@ -60,8 +59,8 @@
 // CHECK: %[[castbase:.*]] = bitcast {{.*}} %[[base]]
 //
 // CHECK: %[[a:.*]] = phi {{.*}} %[[castbase]]
-// CHECK: icmp eq %{{.*}} %[[a]], null
-// CHECK: br i1
+// CHECK-NOT: icmp eq %{{.*}} %[[a]], null
+// CHECK-NOT: br i1
 //
 // CHECK-NOT: call
 // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]])
Index: lib/CodeGen/CGExprCXX.cpp
===
--- lib/CodeGen/CGExprCXX.cpp
+++ lib/CodeGen/CGExprCXX.cpp
@@ -1971,26 +1971,58 @@
   CGF.PopCleanupBlock();
 }
 
-void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) {
-  const Expr *Arg = E->getArgument();
-  Address Ptr = EmitPointerWithAlignment(Arg);
+// Will we read through the deleted pointer?  If so,
+// we must first check it is not null.
+static bool DeleteMightAccessObject(CodeGenFunction &CGF,
+  const CXXDeleteExpr *E,
+  QualType DeleteTy) {
 
-  // Null check the pointer.
-  llvm::BasicBlock *DeleteNotNull = createBasicBlock("delete.notnull");
-  llvm::BasicBlock *

[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-17 Thread Andrew Hunter via Phabricator via cfe-commits
ahh added a comment.

On my workstation's checkout of head, one test fails (Clang :: 
Driver/response-file.c) both with and without this change; everything else 
appears to pass.

I believe that between the tests I add to delete.cpp and the ones that are 
already there (and destroying-delete.cpp) we cover every case that has to get a 
nullptr check, and pretty much every one that should *not*.

Name of the helper function is, uh, good enough for me, but no objections to 
changing it.


Repository:
  rC Clang

https://reviews.llvm.org/D43430



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-18 Thread Andrew Hunter via Phabricator via cfe-commits
ahh updated this revision to Diff 134846.
ahh added a comment.

Fix indentation


Repository:
  rC Clang

https://reviews.llvm.org/D43430

Files:
  lib/CodeGen/CGCXXABI.h
  lib/CodeGen/CGExprCXX.cpp
  test/CodeGenCXX/cxx2a-destroying-delete.cpp
  test/CodeGenCXX/delete-two-arg.cpp
  test/CodeGenCXX/delete.cpp

Index: test/CodeGenCXX/delete.cpp
===
--- test/CodeGenCXX/delete.cpp
+++ test/CodeGenCXX/delete.cpp
@@ -9,7 +9,12 @@
 };
 
 // POD types.
+// CHECK-LABEL: define void @_Z2t3P1S
 void t3(S *s) {
+  // CHECK-NOT: icmp eq {{.*}}, null
+  // CHECK-NOT: br i1
+  // CHECK: call void @_ZdlPv
+
   delete s;
 }
 
@@ -99,6 +104,8 @@
 
 namespace test3 {
   void f(int a[10][20]) {
+// CHECK-NOT: icmp eq {{.*}}, null
+// CHECK-NOT: br i1
 // CHECK: call void @_ZdaPv(i8*
 delete a;
   }
@@ -113,6 +120,8 @@
 
   // CHECK-LABEL: define void @_ZN5test421global_delete_virtualEPNS_1XE
   void global_delete_virtual(X *xp) {
+// CHECK: icmp eq {{.*}}, null
+// CHECK: br i1
 //   Load the offset-to-top from the vtable and apply it.
 //   This has to be done first because the dtor can mess it up.
 // CHECK:  [[T0:%.*]] = bitcast [[X:%.*]]* [[XP:%.*]] to i64**
Index: test/CodeGenCXX/delete-two-arg.cpp
===
--- test/CodeGenCXX/delete-two-arg.cpp
+++ test/CodeGenCXX/delete-two-arg.cpp
@@ -9,8 +9,8 @@
   // CHECK-LABEL: define void @_ZN5test11aEPNS_1AE(
   void a(A *x) {
 // CHECK:  load
-// CHECK-NEXT: icmp eq {{.*}}, null
-// CHECK-NEXT: br i1
+// CHECK-NOT: icmp eq {{.*}}, null
+// CHECK-NOT: br i1
 // CHECK:  call void @_ZN5test11AdlEPvj(i8* %{{.*}}, i32 4)
 delete x;
   }
Index: test/CodeGenCXX/cxx2a-destroying-delete.cpp
===
--- test/CodeGenCXX/cxx2a-destroying-delete.cpp
+++ test/CodeGenCXX/cxx2a-destroying-delete.cpp
@@ -15,9 +15,8 @@
 void delete_A(A *a) { delete a; }
 // CHECK-LABEL: define {{.*}}delete_A
 // CHECK: %[[a:.*]] = load
-// CHECK: icmp eq %{{.*}} %[[a]], null
-// CHECK: br i1
-//
+// CHECK-NOT: icmp eq %{{.*}} %[[a]], null
+// CHECK-NOT: br i1
 // Ensure that we call the destroying delete and not the destructor.
 // CHECK-NOT: call
 // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]])
@@ -60,8 +59,8 @@
 // CHECK: %[[castbase:.*]] = bitcast {{.*}} %[[base]]
 //
 // CHECK: %[[a:.*]] = phi {{.*}} %[[castbase]]
-// CHECK: icmp eq %{{.*}} %[[a]], null
-// CHECK: br i1
+// CHECK-NOT: icmp eq %{{.*}} %[[a]], null
+// CHECK-NOT: br i1
 //
 // CHECK-NOT: call
 // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]])
Index: lib/CodeGen/CGExprCXX.cpp
===
--- lib/CodeGen/CGExprCXX.cpp
+++ lib/CodeGen/CGExprCXX.cpp
@@ -1971,26 +1971,57 @@
   CGF.PopCleanupBlock();
 }
 
-void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) {
-  const Expr *Arg = E->getArgument();
-  Address Ptr = EmitPointerWithAlignment(Arg);
+// Will we read through the deleted pointer?  If so,
+// we must first check it is not null.
+static bool DeleteMightAccessObject(CodeGenFunction &CGF,
+const CXXDeleteExpr *E, QualType DeleteTy) {
 
-  // Null check the pointer.
-  llvm::BasicBlock *DeleteNotNull = createBasicBlock("delete.notnull");
-  llvm::BasicBlock *DeleteEnd = createBasicBlock("delete.end");
+  if (E->getOperatorDelete()->isDestroyingOperatorDelete()) {
+// It is safe to call destroying operator delete with nullptr arguments
+// ([expr.delete] tells us it is unspecified whether a deallocation
+// function is called) but a virtual destructor must be resolved
+// to find the right function, which we can't do on nullptr.
+auto *Dtor = DeleteTy->getAsCXXRecordDecl()->getDestructor();
+return Dtor && Dtor->isVirtual();
+  }
 
-  llvm::Value *IsNull = Builder.CreateIsNull(Ptr.getPointer(), "isnull");
+  if (E->isArrayForm()) {
+return CGF.CGM.getCXXABI().requiresArrayCookie(E, DeleteTy);
+  }
 
-  Builder.CreateCondBr(IsNull, DeleteEnd, DeleteNotNull);
-  EmitBlock(DeleteNotNull);
+  // Otherwise, we should avoid invoking any nontrivial destructor on
+  // a null object.
+  return DeleteTy.isDestructedType();
+}
 
+void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) {
+  const Expr *Arg = E->getArgument();
+  Address Ptr = EmitPointerWithAlignment(Arg);
   QualType DeleteTy = E->getDestroyedType();
 
+  // Null check the pointer, unless the destructor is trivial.  In that case,
+  // all we'll be doing is passing Ptr to ::operator delete(), which is
+  // well formed for nullptr arguments (and allowed by [expr.delete.7]
+  // The overwhelming majority of deletes are of non-nullptr, so there's
+  // no efficiency gain to be had by skipping the very rare exceptions, and
+  // it blee

[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions

2018-02-23 Thread Andrew Hunter via Phabricator via cfe-commits
ahh added a comment.

> If the pointer is not null, the runtime overhead of the null check is pretty 
> negligible next to the cost of actually doing the allocation. If the pointer 
> is null, the runtime overhead of making at least one unnecessary call — 
> probably two, if 'operator delete' doesn't do its own null check before 
> calling 'free', and probably one that crosses image boundaries — is not 
> negligible at all. So the relative impact on code that does end up destroying 
> a trivial value is outsized.

In a reply of mine that I think got eaten by list moderation,  I looked into 
this and benchmarked the cost of ::operator delete; with our tcmalloc, the cost 
of deleting null is about 8 cycles (compared to an empty loop.) (I don't really 
know how to benchmark the version with an if around it, but if we assume that's 
free, 8 cycles is still very cheap.)

I suppose this might go up somewhat in an environment where we have to make 
some sort of PLT call or even two. My Google centric response is "don't do 
that"--linking directly against any modern malloc should avoid any jump in 
`::operator delete` and our environment make direct calls quite fast; I'm not 
sure how generalizable that is. (The linking is I think universally good 
advice; I'm not sure who runs in an environment that cannot make efficient far 
calls. But point is that: while your statement is true, the penalty for getting 
this wrong seems very small, and as you say any programmer can if around it at 
a "hot" null delete, no?

This is one of the few aspects of malloc calls that I don't have near-infinite 
telemetry for (our sampling architecture doesn't easily collect it.)  So I 
cannot give you a hard number of the fraction of deletes that are of null 
pointers, but I am convinced that is very small.  Would more (Google-internal, 
obviously) data on this make a decision easier?

I could see why maybe this could be gated on -Os, but I didn't do this for two 
reasons:

- I am new at Clang development and wasn't sure how to put that sort of a check 
in :) Though I can learn if this is a hard requirement.
- From our perspective, I think Google would want this flag in non-size 
optimization (-O2 or whatever.)  We delete null infrequently enough that I'd 
expect this to be a pure cycle win (if a very small one) and even though 
(because?) we don't optimize for size, we have a number of very large binaries, 
and reducing icache hit can help a lot.

I'm unsure exactly how to make progress here, since for one thing I'm unsure 
how strongly you feel about the potential cost/benefits. Guidance would be 
greatly appreciated!


Repository:
  rC Clang

https://reviews.llvm.org/D43430



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits