[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
ahh created this revision. ahh added a reviewer: rsmith. Herald added a subscriber: cfe-commits. [expr.delete] is pretty crystal clear that it's OK to invoke a deallocation-function on a nullptr delete-expression: "If the value of the operand of the delete-expression is a null pointer value, it is unspecified whether a deallocation function will be called as described above." Even so, we currently check against nullptr unconditionally. This is wasteful for anything with a trivial destructor; deleting nullptr is very rare so it's not worth saving the call to ::operator delete, and this is a useless branch (and waste of code size) in the common case. Condition emission of the branch on us needing to actually look through the pointer for a vtable, size cookie, or nontrivial destructor. (In principle a nontrivial destructor with no side effects that didn't touch the object would also be safe to run unconditionally, but I don't know how to test we have one of those and who in the world would write one?) On an important and very large (~500 MiB) Google search binary, this saves approximately 32 KiB of text. Okay, it's not impressive in a relative sense, but it's still the right thing to do. A note on optimization: we still successfully elide delete-expressions of (literal) nullptr. Where before they were stuck behind a never-taken branch, now they reduce (effectively) to calls to __builtin_operator_delete(nullptr), which EarlyCSE is capable of optimizing out. So this has no cost in the already well-optimized case. Repository: rC Clang https://reviews.llvm.org/D43430 Files: lib/CodeGen/CGCXXABI.h lib/CodeGen/CGExprCXX.cpp test/CodeGenCXX/cxx2a-destroying-delete.cpp test/CodeGenCXX/delete-two-arg.cpp test/CodeGenCXX/delete.cpp Index: test/CodeGenCXX/delete.cpp === --- test/CodeGenCXX/delete.cpp +++ test/CodeGenCXX/delete.cpp @@ -9,7 +9,12 @@ }; // POD types. +// CHECK-LABEL: define void @_Z2t3P1S void t3(S *s) { + // CHECK-NOT: icmp eq {{.*}}, null + // CHECK-NOT: br i1 + // CHECK: call void @_ZdlPv + delete s; } @@ -99,6 +104,8 @@ namespace test3 { void f(int a[10][20]) { +// CHECK-NOT: icmp eq {{.*}}, null +// CHECK-NOT: br i1 // CHECK: call void @_ZdaPv(i8* delete a; } @@ -113,6 +120,8 @@ // CHECK-LABEL: define void @_ZN5test421global_delete_virtualEPNS_1XE void global_delete_virtual(X *xp) { +// CHECK: icmp eq {{.*}}, null +// CHECK: br i1 // Load the offset-to-top from the vtable and apply it. // This has to be done first because the dtor can mess it up. // CHECK: [[T0:%.*]] = bitcast [[X:%.*]]* [[XP:%.*]] to i64** Index: test/CodeGenCXX/delete-two-arg.cpp === --- test/CodeGenCXX/delete-two-arg.cpp +++ test/CodeGenCXX/delete-two-arg.cpp @@ -9,8 +9,8 @@ // CHECK-LABEL: define void @_ZN5test11aEPNS_1AE( void a(A *x) { // CHECK: load -// CHECK-NEXT: icmp eq {{.*}}, null -// CHECK-NEXT: br i1 +// CHECK-NOT: icmp eq {{.*}}, null +// CHECK-NOT: br i1 // CHECK: call void @_ZN5test11AdlEPvj(i8* %{{.*}}, i32 4) delete x; } Index: test/CodeGenCXX/cxx2a-destroying-delete.cpp === --- test/CodeGenCXX/cxx2a-destroying-delete.cpp +++ test/CodeGenCXX/cxx2a-destroying-delete.cpp @@ -15,9 +15,8 @@ void delete_A(A *a) { delete a; } // CHECK-LABEL: define {{.*}}delete_A // CHECK: %[[a:.*]] = load -// CHECK: icmp eq %{{.*}} %[[a]], null -// CHECK: br i1 -// +// CHECK-NOT: icmp eq %{{.*}} %[[a]], null +// CHECK-NOT: br i1 // Ensure that we call the destroying delete and not the destructor. // CHECK-NOT: call // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]]) @@ -60,8 +59,8 @@ // CHECK: %[[castbase:.*]] = bitcast {{.*}} %[[base]] // // CHECK: %[[a:.*]] = phi {{.*}} %[[castbase]] -// CHECK: icmp eq %{{.*}} %[[a]], null -// CHECK: br i1 +// CHECK-NOT: icmp eq %{{.*}} %[[a]], null +// CHECK-NOT: br i1 // // CHECK-NOT: call // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]]) Index: lib/CodeGen/CGExprCXX.cpp === --- lib/CodeGen/CGExprCXX.cpp +++ lib/CodeGen/CGExprCXX.cpp @@ -1971,26 +1971,58 @@ CGF.PopCleanupBlock(); } -void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) { - const Expr *Arg = E->getArgument(); - Address Ptr = EmitPointerWithAlignment(Arg); +// Will we read through the deleted pointer? If so, +// we must first check it is not null. +static bool DeleteMightAccessObject(CodeGenFunction &CGF, + const CXXDeleteExpr *E, + QualType DeleteTy) { - // Null check the pointer. - llvm::BasicBlock *DeleteNotNull = createBasicBlock("delete.notnull"); - llvm::BasicBlock *
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
ahh added a comment. On my workstation's checkout of head, one test fails (Clang :: Driver/response-file.c) both with and without this change; everything else appears to pass. I believe that between the tests I add to delete.cpp and the ones that are already there (and destroying-delete.cpp) we cover every case that has to get a nullptr check, and pretty much every one that should *not*. Name of the helper function is, uh, good enough for me, but no objections to changing it. Repository: rC Clang https://reviews.llvm.org/D43430 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
ahh updated this revision to Diff 134846. ahh added a comment. Fix indentation Repository: rC Clang https://reviews.llvm.org/D43430 Files: lib/CodeGen/CGCXXABI.h lib/CodeGen/CGExprCXX.cpp test/CodeGenCXX/cxx2a-destroying-delete.cpp test/CodeGenCXX/delete-two-arg.cpp test/CodeGenCXX/delete.cpp Index: test/CodeGenCXX/delete.cpp === --- test/CodeGenCXX/delete.cpp +++ test/CodeGenCXX/delete.cpp @@ -9,7 +9,12 @@ }; // POD types. +// CHECK-LABEL: define void @_Z2t3P1S void t3(S *s) { + // CHECK-NOT: icmp eq {{.*}}, null + // CHECK-NOT: br i1 + // CHECK: call void @_ZdlPv + delete s; } @@ -99,6 +104,8 @@ namespace test3 { void f(int a[10][20]) { +// CHECK-NOT: icmp eq {{.*}}, null +// CHECK-NOT: br i1 // CHECK: call void @_ZdaPv(i8* delete a; } @@ -113,6 +120,8 @@ // CHECK-LABEL: define void @_ZN5test421global_delete_virtualEPNS_1XE void global_delete_virtual(X *xp) { +// CHECK: icmp eq {{.*}}, null +// CHECK: br i1 // Load the offset-to-top from the vtable and apply it. // This has to be done first because the dtor can mess it up. // CHECK: [[T0:%.*]] = bitcast [[X:%.*]]* [[XP:%.*]] to i64** Index: test/CodeGenCXX/delete-two-arg.cpp === --- test/CodeGenCXX/delete-two-arg.cpp +++ test/CodeGenCXX/delete-two-arg.cpp @@ -9,8 +9,8 @@ // CHECK-LABEL: define void @_ZN5test11aEPNS_1AE( void a(A *x) { // CHECK: load -// CHECK-NEXT: icmp eq {{.*}}, null -// CHECK-NEXT: br i1 +// CHECK-NOT: icmp eq {{.*}}, null +// CHECK-NOT: br i1 // CHECK: call void @_ZN5test11AdlEPvj(i8* %{{.*}}, i32 4) delete x; } Index: test/CodeGenCXX/cxx2a-destroying-delete.cpp === --- test/CodeGenCXX/cxx2a-destroying-delete.cpp +++ test/CodeGenCXX/cxx2a-destroying-delete.cpp @@ -15,9 +15,8 @@ void delete_A(A *a) { delete a; } // CHECK-LABEL: define {{.*}}delete_A // CHECK: %[[a:.*]] = load -// CHECK: icmp eq %{{.*}} %[[a]], null -// CHECK: br i1 -// +// CHECK-NOT: icmp eq %{{.*}} %[[a]], null +// CHECK-NOT: br i1 // Ensure that we call the destroying delete and not the destructor. // CHECK-NOT: call // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]]) @@ -60,8 +59,8 @@ // CHECK: %[[castbase:.*]] = bitcast {{.*}} %[[base]] // // CHECK: %[[a:.*]] = phi {{.*}} %[[castbase]] -// CHECK: icmp eq %{{.*}} %[[a]], null -// CHECK: br i1 +// CHECK-NOT: icmp eq %{{.*}} %[[a]], null +// CHECK-NOT: br i1 // // CHECK-NOT: call // CHECK-ITANIUM: call void @_ZN1AdlEPS_St19destroying_delete_t(%{{.*}}* %[[a]]) Index: lib/CodeGen/CGExprCXX.cpp === --- lib/CodeGen/CGExprCXX.cpp +++ lib/CodeGen/CGExprCXX.cpp @@ -1971,26 +1971,57 @@ CGF.PopCleanupBlock(); } -void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) { - const Expr *Arg = E->getArgument(); - Address Ptr = EmitPointerWithAlignment(Arg); +// Will we read through the deleted pointer? If so, +// we must first check it is not null. +static bool DeleteMightAccessObject(CodeGenFunction &CGF, +const CXXDeleteExpr *E, QualType DeleteTy) { - // Null check the pointer. - llvm::BasicBlock *DeleteNotNull = createBasicBlock("delete.notnull"); - llvm::BasicBlock *DeleteEnd = createBasicBlock("delete.end"); + if (E->getOperatorDelete()->isDestroyingOperatorDelete()) { +// It is safe to call destroying operator delete with nullptr arguments +// ([expr.delete] tells us it is unspecified whether a deallocation +// function is called) but a virtual destructor must be resolved +// to find the right function, which we can't do on nullptr. +auto *Dtor = DeleteTy->getAsCXXRecordDecl()->getDestructor(); +return Dtor && Dtor->isVirtual(); + } - llvm::Value *IsNull = Builder.CreateIsNull(Ptr.getPointer(), "isnull"); + if (E->isArrayForm()) { +return CGF.CGM.getCXXABI().requiresArrayCookie(E, DeleteTy); + } - Builder.CreateCondBr(IsNull, DeleteEnd, DeleteNotNull); - EmitBlock(DeleteNotNull); + // Otherwise, we should avoid invoking any nontrivial destructor on + // a null object. + return DeleteTy.isDestructedType(); +} +void CodeGenFunction::EmitCXXDeleteExpr(const CXXDeleteExpr *E) { + const Expr *Arg = E->getArgument(); + Address Ptr = EmitPointerWithAlignment(Arg); QualType DeleteTy = E->getDestroyedType(); + // Null check the pointer, unless the destructor is trivial. In that case, + // all we'll be doing is passing Ptr to ::operator delete(), which is + // well formed for nullptr arguments (and allowed by [expr.delete.7] + // The overwhelming majority of deletes are of non-nullptr, so there's + // no efficiency gain to be had by skipping the very rare exceptions, and + // it blee
[PATCH] D43430: Omit nullptr check for sufficiently simple delete-expressions
ahh added a comment. > If the pointer is not null, the runtime overhead of the null check is pretty > negligible next to the cost of actually doing the allocation. If the pointer > is null, the runtime overhead of making at least one unnecessary call — > probably two, if 'operator delete' doesn't do its own null check before > calling 'free', and probably one that crosses image boundaries — is not > negligible at all. So the relative impact on code that does end up destroying > a trivial value is outsized. In a reply of mine that I think got eaten by list moderation, I looked into this and benchmarked the cost of ::operator delete; with our tcmalloc, the cost of deleting null is about 8 cycles (compared to an empty loop.) (I don't really know how to benchmark the version with an if around it, but if we assume that's free, 8 cycles is still very cheap.) I suppose this might go up somewhat in an environment where we have to make some sort of PLT call or even two. My Google centric response is "don't do that"--linking directly against any modern malloc should avoid any jump in `::operator delete` and our environment make direct calls quite fast; I'm not sure how generalizable that is. (The linking is I think universally good advice; I'm not sure who runs in an environment that cannot make efficient far calls. But point is that: while your statement is true, the penalty for getting this wrong seems very small, and as you say any programmer can if around it at a "hot" null delete, no? This is one of the few aspects of malloc calls that I don't have near-infinite telemetry for (our sampling architecture doesn't easily collect it.) So I cannot give you a hard number of the fraction of deletes that are of null pointers, but I am convinced that is very small. Would more (Google-internal, obviously) data on this make a decision easier? I could see why maybe this could be gated on -Os, but I didn't do this for two reasons: - I am new at Clang development and wasn't sure how to put that sort of a check in :) Though I can learn if this is a hard requirement. - From our perspective, I think Google would want this flag in non-size optimization (-O2 or whatever.) We delete null infrequently enough that I'd expect this to be a pure cycle win (if a very small one) and even though (because?) we don't optimize for size, we have a number of very large binaries, and reducing icache hit can help a lot. I'm unsure exactly how to make progress here, since for one thing I'm unsure how strongly you feel about the potential cost/benefits. Guidance would be greatly appreciated! Repository: rC Clang https://reviews.llvm.org/D43430 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits