[clang] [llvm] [not for merge][RFC] Key Instructions front end demo (PR #130943)

via cfe-commits Wed, 12 Mar 2025 04:05:23 -0700

llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-clang

Author: Orlando Cazalet-Hyams (OCHyams)

<details>
<summary>Changes</summary>

This draft pull request demonstrates our proposed approach to annotating 
instructions with Key Instructions metadata. It's not fully complete but works 
as a proof of concept and I welcome and encourage feedback on the approach, as 
I'm not particularly familiar with Clang.

The Key Instructions project is introduced here 
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668,
 which includes a "quick summary" section at the top which adds context for 
this PR.

I'll walk through the changes first, followed by some questions at the end.

Note: this PR doesn't include the LLVM parts, so you won't be able to try it 
out locally.

## Overview

We'd like Clang to annotate instructions with additional metadata (bundled 
inside DILocations) so that LLVM can make better `is_stmt` placement decisions. 
Specifically we'll add two new fields to DILocations: An "atom group" number 
for instructions that perform key functionality ("interesting" behaviour from a 
debugger-user perspective, see _Key Instructions: Solving the Code Location 
Problem for Optimized Code (C. Tice, . S. L. Graham, 2000)_); and an "atom 
rank" number which indicates precedence within a set of instructions that have 
the same atom group number.

In the project prototype we interpreted IR in a pre-optimisation pass to decide 
the metadata placement. Having Clang perform the annotation comes with several 
benefits: it allows the front end to be opinionated about stepping locations, 
it's more performant for the front end to do it, and it hopefully improves the 
chance that it can be adopted into other front ends in which the prototype 
approach isn't approprite.

First, here's an example to highlight the new metadata we'll be producing:

`$ cat -n test.cpp`
```
1.  void f(int a) {
2.    int x[256] = {a};
3.  }
```
`clang++ test.cpp -gmlt -emit-llvm -S`
```
define hidden void @<!-- -->_Z1fi(i32 noundef %a) #<!-- -->0 !dbg !11 {
entry:
  %a.addr = alloca i32, align 4
  %x = alloca [256 x i32], align 16
  store i32 %a, ptr %a.addr, align 4
  call void @<!-- -->llvm.memset.p0.i64(ptr align 16 %x, i8 0, i64 1024, i1 
false), !dbg !14      ; atom 1, rank 1
  %arrayinit.begin = getelementptr inbounds [256 x i32], ptr %x, i64 0, i64 0, 
!dbg !15
  %0 = load i32, ptr %a.addr, align 4, !dbg !16                                 
          ; atom 1, rank 2
  store i32 %0, ptr %arrayinit.begin, align 4, !dbg !17                         
          ; atom 1, rank 1
  ret void, !dbg !18                                                            
          ; atom 2, rank 1
}

...
!14 = !DILocation(line: 2, column: 7, scope: !11, atomGroup: 1, atomRank: 1)
!15 = !DILocation(line: 2, column: 16, scope: !11)
!16 = !DILocation(line: 2, column: 17, scope: !11, atomGroup: 1, atomRank: 2)
!17 = !DILocation(line: 2, column: 16, scope: !11, atomGroup: 1, atomRank: 1)
!18 = !DILocation(line: 3, column: 1, scope: !11, atomGroup: 2, atomRank: 1)
```
Atom 1 represents the aggregate initialization of `x`. The store and memset 
have rank 1. That gives them precedence the load of `a` with rank 2, which is 
essentially a "backup instruction" for the purposes of assigning `is_stmt`. If 
all rank 1 instructions are optimized away, we'll use that instead. Atom 2 
represents the implicit return (given the source location of the closing brace).

&lt;details&gt;
&lt;summary&gt;
DWARF emission info for additional context (feel free to ignore).
&lt;/summary&gt;

Not shown in this patch - During dwarf emission, for each atom group, the last 
instruction in a block that shares the lowest rank will be candidates for 
`is_stmt`. The wording is tricky, but essentially if any rank 1 instructions 
exist, higher rank (lower precedence) instructions won't be candidates for 
`is_stmt` even if they're in different blocks, but all the final rank 1 
instructions in the group in different blocks will be.

As an optimisation for convinience (mostly to minimise unecessary difference 
when the feature is enabled, but also to improve variable availability in some 
cases), we apply a heuristc that "floats" the `is_stmt` up to the first 
instruction in a contiguous block of instructions with the same line number.

In the example above the `store` in atom 1 is the candidate for `is_stmt` (the 
`memset` comes before the `store` and the `load` is lower precedence as well as 
before the `store`). Because of the heuristic described above, the `is_stmt` 
flag floats up to the memset, as all the preceeding instructions have the same 
line number.
&lt;/details&gt;

## Implementation

We need to annotate assignments, conditional branches, some unconditional 
branches, and calls as these are instructions implementing "key functionality". 
The GEP used to compute the offset at which to store a value for an assignment 
is not interesting, but the store itself is. We also want to annotate "backup 
instructions". E.g., the instruction computing the value being stored, or the 
`cmp` used for a conditional branch.

`ApplyAtomGroup` works similarly to and alongside `ApplyDebugLocation` as an 
RAII wrapper around a "current atom group" number during CodeGen. The current 
atom number is applied to certain instructions as they're emitted (stores, 
branches, etc) using `addInstToCurrentSourceAtom`.

Here's how it looks with the aggregate initialisation example from the overview 
section:

&lt;details&gt;
&lt;summary&gt;
`EmitAutoVarInit` creates a new atom group with `ApplyAtomGroup`.
&lt;/summary&gt;

```
&gt; 
clang::CodeGen::ApplyAtomGroup::ApplyAtomGroup(clang::CodeGen::CodeGenFunction 
&amp; CGF) Line 184
  clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const 
clang::CodeGen::CodeGenFunction::AutoVarEmission &amp; emission) Line 1958 ++
  clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl &amp; 
D) Line 1358
  clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl &amp; D) 
Line 219
  ...
```
&lt;/details&gt;

&lt;details&gt;
&lt;summary&gt;
`CheckAggExprForMemSetUse` calls `addInstToCurrentSourceAtom` to add the memset 
to the current atom group.
&lt;/summary&gt;

```
&gt; 
clang::CodeGen::CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction * 
KeyInstruction, llvm::Value * Backup, unsigned   char KeyInstRank) Line 2551
  CheckAggExprForMemSetUse(clang::CodeGen::AggValueSlot &amp; Slot, const 
clang::Expr * E, clang::CodeGen::CodeGenFunction &amp; CGF) Line   2026
  clang::CodeGen::CodeGenFunction::EmitAggExpr(const clang::Expr * E, 
clang::CodeGen::AggValueSlot Slot) Line 2043
  clang::CodeGen::CodeGenFunction::EmitExprAsInit(const clang::Expr * init, 
const clang::ValueDecl * D, clang::CodeGen::LValue   lvalue, bool 
capturedByInit) Line 2104
  clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const 
clang::CodeGen::CodeGenFunction::AutoVarEmission &amp; emission) Line 2047
  clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl &amp; 
D) Line 1358
  clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl &amp; D) 
Line 219
  ...
```
&lt;/details&gt;

&lt;details&gt;
&lt;summary&gt;
`EmitStoreOfScalar` does the same for the store (note - this is the same atom 
group number as the memset).
&lt;/summary&gt;

```
&gt; 
clang::CodeGen::CodeGenFunction::addInstToCurrentSourceAtom(llvm::Instruction * 
KeyInstruction, llvm::Value * Backup, unsigned char KeyInstRank) Line 2551
  clang::CodeGen::CodeGenFunction::EmitStoreOfScalar(llvm::Value * Value, 
clang::CodeGen::Address Addr, bool Volatile, clang::QualType Ty, 
clang::CodeGen::LValueBaseInfo BaseInfo, clang::CodeGen::TBAAAccessInfo 
TBAAInfo, bool isInit, bool isNontemporal) Line 2133
  clang::CodeGen::CodeGenFunction::EmitStoreOfScalar(llvm::Value * value, 
clang::CodeGen::LValue lvalue, bool isInit) Line 2152
  
clang::CodeGen::CodeGenFunction::EmitStoreThroughLValue(clang::CodeGen::RValue 
Src, clang::CodeGen::LValue Dst, bool isInit) Line 2466
  clang::CodeGen::CodeGenFunction::EmitScalarInit(const clang::Expr * init, 
const clang::ValueDecl * D, clang::CodeGen::LValue lvalue, bool capturedByInit) 
Line 803
  `anonymous namespace'::AggExprEmitter::EmitInitializationToLValue(clang::Expr 
* E, clang::CodeGen::LValue LV) Line 1592
  `anonymous namespace'::AggExprEmitter::EmitArrayInit(clang::CodeGen::Address 
DestPtr, llvm::ArrayType * AType, clang::QualType rrayQTy, clang::Expr * 
ExprToVisit, llvm::ArrayRef&lt;clang::Expr *&gt; Args, clang::Expr * 
ArrayFiller) Line 614
  `anonymous 
namespace'::AggExprEmitter::VisitCXXParenListOrInitListExpr(clang::Expr * 
ExprToVisit, llvm::ArrayRef&lt;clang::Expr *&gt; nitExprs, clang::FieldDecl * 
InitializedFieldInUnion, clang::Expr * ArrayFiller) Line 1672
  `anonymous namespace'::AggExprEmitter::VisitInitListExpr(clang::InitListExpr 
* E) Line 1641
  clang::StmtVisitorBase&lt;std::add_pointer,`anonymous 
namespace'::AggExprEmitter,void&gt;::Visit(clang::Stmt * S) Line 352
  `anonymous namespace'::AggExprEmitter::Visit(clang::Expr * E) Line 114
  clang::CodeGen::CodeGenFunction::EmitAggExpr(const clang::Expr * E, 
clang::CodeGen::AggValueSlot Slot) Line 2045
  clang::CodeGen::CodeGenFunction::EmitExprAsInit(const clang::Expr * init, 
const clang::ValueDecl * D, clang::CodeGen::LValue value, bool capturedByInit) 
Line 2104
  clang::CodeGen::CodeGenFunction::EmitAutoVarInit(const 
clang::CodeGen::CodeGenFunction::AutoVarEmission &amp; emission) Line 2047     
C+
  clang::CodeGen::CodeGenFunction::EmitAutoVarDecl(const clang::VarDecl &amp; 
D) Line 1358
  clang::CodeGen::CodeGenFunction::EmitVarDecl(const clang::VarDecl &amp; D) 
Line 219
  ...
```

&lt;/details&gt;

`addInstToCurrentSourceAtom` annotates the instruction that produces the stored 
value too, with a higher rank than the store (lower precedence).


## Rough edges and questions

* The single-block return statement handling (see changes in `EmitReturnBlock` 
in clang/lib/CodeGen/CGStmt.cpp and `addRetToOverrideOrNewSourceAtom` usage in 
`EmitFunctionEpilog`) doesn't fit the RAII model (I've not found any other 
cases yet though).
* There's some inefficiency: DILocations are attached to instructions then 
immediately replaced with a new version with Key Instructions metadata.
* It doesn't fail gracefully: if we accidentally fail to annotate instructions 
that should be key, those instructions won't be candidate for `is_stmt` and 
therefore will likely be stepped-over while debugging.
    * Is there a way to implement this that errs on over-application of 
annotations?
        * Perhaps every statement (in `EmitStmt`) could be assumed to be in an 
atom group by default, with particular statements opting out, rather than 
interesting ones opting in?
        * I think we could add all stores (and memsets etc) to a "new" atom 
group by default, unless there's an "active" RAII group already or they've 
opted-out. That would offer a graceful fallback for stores. It doesn't help so 
much with control flow (as I think only some branches are interesting, compared 
to most stores being interesting).

Finally, does this direction look reasonable overall?

---

Patch is 90.18 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/130943.diff


48 Files Affected:

- (modified) clang/lib/CodeGen/CGBuiltin.cpp (+35-14) 
- (modified) clang/lib/CodeGen/CGCall.cpp (+15-1) 
- (modified) clang/lib/CodeGen/CGClass.cpp (+1) 
- (modified) clang/lib/CodeGen/CGCleanup.cpp (+3) 
- (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+148-1) 
- (modified) clang/lib/CodeGen/CGDebugInfo.h (+51) 
- (modified) clang/lib/CodeGen/CGDecl.cpp (+31-22) 
- (modified) clang/lib/CodeGen/CGDeclCXX.cpp (+3-1) 
- (modified) clang/lib/CodeGen/CGExpr.cpp (+21-5) 
- (modified) clang/lib/CodeGen/CGExprAgg.cpp (+7-2) 
- (modified) clang/lib/CodeGen/CGExprComplex.cpp (+10-2) 
- (modified) clang/lib/CodeGen/CGExprScalar.cpp (+4) 
- (modified) clang/lib/CodeGen/CGStmt.cpp (+57-9) 
- (modified) clang/lib/CodeGen/CodeGenFunction.cpp (+44-3) 
- (modified) clang/lib/CodeGen/CodeGenFunction.h (+17-1) 
- (modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+1) 
- (added) clang/test/KeyInstructions/agg-cpy.cpp (+13) 
- (added) clang/test/KeyInstructions/agg-init-bzero-plus-stores.cpp (+25) 
- (added) clang/test/KeyInstructions/agg-init.cpp (+24) 
- (added) clang/test/KeyInstructions/agg-null-init.cpp (+21) 
- (added) clang/test/KeyInstructions/agg-return-va-arg.cpp (+23) 
- (added) clang/test/KeyInstructions/assign.c (+11) 
- (added) clang/test/KeyInstructions/binop.c (+17) 
- (added) clang/test/KeyInstructions/bitfield.cpp (+15) 
- (added) clang/test/KeyInstructions/cast.cpp (+17) 
- (added) clang/test/KeyInstructions/complex.cpp (+28) 
- (added) clang/test/KeyInstructions/compound-assign.cpp (+13) 
- (added) clang/test/KeyInstructions/do.cpp (+20) 
- (added) clang/test/KeyInstructions/double-assign.cpp (+22) 
- (added) clang/test/KeyInstructions/for.cpp (+34) 
- (added) clang/test/KeyInstructions/if.cpp (+42) 
- (added) clang/test/KeyInstructions/inc.cpp (+13) 
- (added) clang/test/KeyInstructions/member-init.cpp (+13) 
- (added) clang/test/KeyInstructions/memset.c (+10) 
- (added) clang/test/KeyInstructions/multi-func.cpp (+28) 
- (added) clang/test/KeyInstructions/new.cpp (+36) 
- (added) clang/test/KeyInstructions/ret-agg.cpp (+23) 
- (added) clang/test/KeyInstructions/return-no-loc.c (+26) 
- (added) clang/test/KeyInstructions/return-ref.cpp (+19) 
- (added) clang/test/KeyInstructions/return.c (+36) 
- (added) clang/test/KeyInstructions/return.cpp (+26) 
- (added) clang/test/KeyInstructions/scalar-init.cpp (+17) 
- (added) clang/test/KeyInstructions/static-init.cpp (+13) 
- (added) clang/test/KeyInstructions/switch.cpp (+48) 
- (added) clang/test/KeyInstructions/try-catch.cpp (+20) 
- (added) clang/test/KeyInstructions/while.cpp (+20) 
- (modified) llvm/include/llvm/IR/LLVMContext.h (+4) 
- (modified) llvm/lib/IR/LLVMContext.cpp (+4) 


``````````diff
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index afd9798cd639b..8add3d7296c2e 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -13,6 +13,7 @@
 #include "ABIInfo.h"
 #include "CGCUDARuntime.h"
 #include "CGCXXABI.h"
+#include "CGDebugInfo.h"
 #include "CGHLSLRuntime.h"
 #include "CGObjCRuntime.h"
 #include "CGOpenCLRuntime.h"
@@ -39,11 +40,13 @@
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/ADT/FloatingPointMode.h"
+#include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/Analysis/ValueTracking.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/InlineAsm.h"
+#include "llvm/IR/Instruction.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/IntrinsicsAArch64.h"
 #include "llvm/IR/IntrinsicsAMDGPU.h"
@@ -97,6 +100,7 @@ static void initializeAlloca(CodeGenFunction &CGF, 
AllocaInst *AI, Value *Size,
   if (CGF.CGM.stopAutoInit())
     return;
   auto *I = CGF.Builder.CreateMemSet(AI, Byte, Size, AlignmentInBytes);
+  CGF.addInstToCurrentSourceAtom(I, nullptr);
   I->addAnnotationMetadata("auto-init");
 }
 
@@ -3543,6 +3547,14 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     }
   };
 
+  // FIXME(OCH): Should any of the maths builtins be key instructions?
+  auto Grp = ApplyAtomGroup(*this);
+  llvm::Instruction *InstForAtomGrp = nullptr;
+  auto Cleanup = llvm::make_scope_exit([&]() {
+    if (InstForAtomGrp)
+      addInstToCurrentSourceAtom(InstForAtomGrp, nullptr);
+  });
+
   switch (BuiltinIDIfNoAsmLabel) {
   default: break;
   case Builtin::BI__builtin___CFStringMakeConstantString:
@@ -3948,6 +3960,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
   case Builtin::BI_byteswap_ushort:
   case Builtin::BI_byteswap_ulong:
   case Builtin::BI_byteswap_uint64: {
+    // FIXME(OCH): Should bswap and similar intrinsics be key instructions?
+    // If the result is stored then that will be key - is that enough?
     return RValue::get(
         emitBuiltinWithOneOverloadedType<1>(*this, E, Intrinsic::bswap));
   }
@@ -4080,7 +4094,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     return RValue::get(Builder.CreateCall(F, {Begin, End}));
   }
   case Builtin::BI__builtin_trap:
-    EmitTrapCall(Intrinsic::trap);
+    InstForAtomGrp = EmitTrapCall(Intrinsic::trap);
     return RValue::get(nullptr);
   case Builtin::BI__builtin_verbose_trap: {
     llvm::DILocation *TrapLocation = Builder.getCurrentDebugLocation();
@@ -4095,10 +4109,10 @@ RValue CodeGenFunction::EmitBuiltinExpr(const 
GlobalDecl GD, unsigned BuiltinID,
     return RValue::get(nullptr);
   }
   case Builtin::BI__debugbreak:
-    EmitTrapCall(Intrinsic::debugtrap);
+    InstForAtomGrp = EmitTrapCall(Intrinsic::debugtrap);
     return RValue::get(nullptr);
   case Builtin::BI__builtin_unreachable: {
-    EmitUnreachable(E->getExprLoc());
+    InstForAtomGrp = EmitUnreachable(E->getExprLoc());
 
     // We do need to preserve an insertion point.
     EmitBlock(createBasicBlock("unreachable.cont"));
@@ -4547,6 +4561,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
         Matrix, Dst.emitRawPointer(*this),
         Align(Dst.getAlignment().getQuantity()), Stride, IsVolatile,
         MatrixTy->getNumRows(), MatrixTy->getNumColumns());
+    InstForAtomGrp = cast<llvm::Instruction>(Result);
     return RValue::get(Result);
   }
 
@@ -4667,6 +4682,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
             .getAsAlign();
     AllocaInst *AI = Builder.CreateAlloca(Builder.getInt8Ty(), Size);
     AI->setAlignment(SuitableAlignmentInBytes);
+    // NOTE(OCH): `initializeAlloca` adds Key Instruction metadata.
     if (BuiltinID != Builtin::BI__builtin_alloca_uninitialized)
       initializeAlloca(*this, AI, Size, SuitableAlignmentInBytes);
     LangAS AAS = getASTAllocaAddressSpace();
@@ -4689,6 +4705,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
         CGM.getContext().toCharUnitsFromBits(AlignmentInBits).getAsAlign();
     AllocaInst *AI = Builder.CreateAlloca(Builder.getInt8Ty(), Size);
     AI->setAlignment(AlignmentInBytes);
+    // NOTE(OCH): `initializeAlloca` adds Key Instruction metadata.
     if (BuiltinID != Builtin::BI__builtin_alloca_with_align_uninitialized)
       initializeAlloca(*this, AI, Size, AlignmentInBytes);
     LangAS AAS = getASTAllocaAddressSpace();
@@ -4707,7 +4724,8 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     Value *SizeVal = EmitScalarExpr(E->getArg(1));
     EmitNonNullArgCheck(Dest, E->getArg(0)->getType(),
                         E->getArg(0)->getExprLoc(), FD, 0);
-    Builder.CreateMemSet(Dest, Builder.getInt8(0), SizeVal, false);
+    InstForAtomGrp =
+        Builder.CreateMemSet(Dest, Builder.getInt8(0), SizeVal, false);
     return RValue::get(nullptr);
   }
 
@@ -4722,7 +4740,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     EmitNonNullArgCheck(RValue::get(Dest.emitRawPointer(*this)),
                         E->getArg(1)->getType(), E->getArg(1)->getExprLoc(), 
FD,
                         0);
-    Builder.CreateMemMove(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemMove(Dest, Src, SizeVal, false);
     return RValue::get(nullptr);
   }
 
@@ -4735,7 +4753,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     Value *SizeVal = EmitScalarExpr(E->getArg(2));
     EmitArgCheck(TCK_Store, Dest, E->getArg(0), 0);
     EmitArgCheck(TCK_Load, Src, E->getArg(1), 1);
-    Builder.CreateMemCpy(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemCpy(Dest, Src, SizeVal, false);
     if (BuiltinID == Builtin::BImempcpy ||
         BuiltinID == Builtin::BI__builtin_mempcpy)
       return RValue::get(Builder.CreateInBoundsGEP(
@@ -4751,7 +4769,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
         E->getArg(2)->EvaluateKnownConstInt(getContext()).getZExtValue();
     EmitArgCheck(TCK_Store, Dest, E->getArg(0), 0);
     EmitArgCheck(TCK_Load, Src, E->getArg(1), 1);
-    Builder.CreateMemCpyInline(Dest, Src, Size);
+    InstForAtomGrp = Builder.CreateMemCpyInline(Dest, Src, Size);
     return RValue::get(nullptr);
   }
 
@@ -4772,7 +4790,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
     Address Src = EmitPointerWithAlignment(E->getArg(1));
     Value *SizeVal = llvm::ConstantInt::get(Builder.getContext(), Size);
-    Builder.CreateMemCpy(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemCpy(Dest, Src, SizeVal, false);
     return RValue::get(Dest, *this);
   }
 
@@ -4798,7 +4816,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
     Address Src = EmitPointerWithAlignment(E->getArg(1));
     Value *SizeVal = llvm::ConstantInt::get(Builder.getContext(), Size);
-    Builder.CreateMemMove(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemMove(Dest, Src, SizeVal, false);
     return RValue::get(Dest, *this);
   }
 
@@ -4809,7 +4827,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     Value *SizeVal = EmitScalarExpr(E->getArg(2));
     EmitArgCheck(TCK_Store, Dest, E->getArg(0), 0);
     EmitArgCheck(TCK_Load, Src, E->getArg(1), 1);
-    Builder.CreateMemMove(Dest, Src, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemMove(Dest, Src, SizeVal, false);
     return RValue::get(Dest, *this);
   }
   case Builtin::BImemset:
@@ -4820,7 +4838,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     Value *SizeVal = EmitScalarExpr(E->getArg(2));
     EmitNonNullArgCheck(Dest, E->getArg(0)->getType(),
                         E->getArg(0)->getExprLoc(), FD, 0);
-    Builder.CreateMemSet(Dest, ByteVal, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemSet(Dest, ByteVal, SizeVal, false);
     return RValue::get(Dest, *this);
   }
   case Builtin::BI__builtin_memset_inline: {
@@ -4832,7 +4850,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     EmitNonNullArgCheck(RValue::get(Dest.emitRawPointer(*this)),
                         E->getArg(0)->getType(), E->getArg(0)->getExprLoc(), 
FD,
                         0);
-    Builder.CreateMemSetInline(Dest, ByteVal, Size);
+    InstForAtomGrp = Builder.CreateMemSetInline(Dest, ByteVal, Size);
     return RValue::get(nullptr);
   }
   case Builtin::BI__builtin___memset_chk: {
@@ -4849,10 +4867,13 @@ RValue CodeGenFunction::EmitBuiltinExpr(const 
GlobalDecl GD, unsigned BuiltinID,
     Value *ByteVal = Builder.CreateTrunc(EmitScalarExpr(E->getArg(1)),
                                          Builder.getInt8Ty());
     Value *SizeVal = llvm::ConstantInt::get(Builder.getContext(), Size);
-    Builder.CreateMemSet(Dest, ByteVal, SizeVal, false);
+    InstForAtomGrp = Builder.CreateMemSet(Dest, ByteVal, SizeVal, false);
     return RValue::get(Dest, *this);
   }
   case Builtin::BI__builtin_wmemchr: {
+    // FIXME(OCH): Probably ok for none of the inline implementation to be key.
+    // If the result is stored, that store should be a stepping location.
+
     // The MSVC runtime library does not provide a definition of wmemchr, so we
     // need an inline implementation.
     if (!getTarget().getTriple().isOSMSVCRT())
@@ -6462,7 +6483,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl 
GD, unsigned BuiltinID,
     Value *Val = EmitScalarExpr(E->getArg(0));
     Address Address = EmitPointerWithAlignment(E->getArg(1));
     Value *HalfVal = Builder.CreateFPTrunc(Val, Builder.getHalfTy());
-    Builder.CreateStore(HalfVal, Address);
+    InstForAtomGrp = Builder.CreateStore(HalfVal, Address);
     return RValue::get(nullptr);
   }
   case Builtin::BI__builtin_load_half: {
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index bfcbc273dbda7..9c7286147d564 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -17,6 +17,7 @@
 #include "CGBlocks.h"
 #include "CGCXXABI.h"
 #include "CGCleanup.h"
+#include "CGDebugInfo.h"
 #include "CGRecordLayout.h"
 #include "CodeGenFunction.h"
 #include "CodeGenModule.h"
@@ -37,6 +38,7 @@
 #include "llvm/IR/CallingConv.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/InlineAsm.h"
+#include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/Type.h"
@@ -3883,7 +3885,8 @@ void CodeGenFunction::EmitFunctionEpilog(const 
CGFunctionInfo &FI,
 
   // Functions with no result always return void.
   if (!ReturnValue.isValid()) {
-    Builder.CreateRetVoid();
+    auto *I = Builder.CreateRetVoid();
+    addRetToOverrideOrNewSourceAtom(I, nullptr);
     return;
   }
 
@@ -4065,6 +4068,9 @@ void CodeGenFunction::EmitFunctionEpilog(const 
CGFunctionInfo &FI,
 
   if (RetDbgLoc)
     Ret->setDebugLoc(std::move(RetDbgLoc));
+
+  llvm::Value *Backup = RV ? Ret->getOperand(0) : nullptr;
+  addRetToOverrideOrNewSourceAtom(cast<llvm::ReturnInst>(Ret), Backup);
 }
 
 void CodeGenFunction::EmitReturnValueCheck(llvm::Value *RV) {
@@ -5829,6 +5835,14 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
                               BundleList);
     EmitBlock(Cont);
   }
+
+  // NOTE(OCH) We only want the group to apply to the call instuction
+  // specifically. N.B. we currently apply is_stmt to all calls at DWARF
+  // emission time. That makes it easy to avoid "over propagating" is_stmt when
+  // calls are lowered. That's easiest, so we continue to do that for now.
+  // FIXME(OCH): Reinstate this once that is no longer the case.
+  // addInstToNewSourceAtom(CI, nullptr);
+
   if (CI->getCalledFunction() && CI->getCalledFunction()->hasName() &&
       CI->getCalledFunction()->getName().starts_with("_Z4sqrt")) {
     SetSqrtFPAccuracy(CI);
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index e54fd543f217b..d69fe1e509ffe 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -1339,6 +1339,7 @@ void CodeGenFunction::EmitCtorPrologue(const 
CXXConstructorDecl *CD,
     assert(!Member->isBaseInitializer());
     assert(Member->isAnyMemberInitializer() &&
            "Delegating initializer on non-delegating constructor");
+    auto Grp = ApplyAtomGroup(*this);
     CM.addMemberInitializer(Member);
   }
   CM.finish();
diff --git a/clang/lib/CodeGen/CGCleanup.cpp b/clang/lib/CodeGen/CGCleanup.cpp
index 7e1c5b7da9552..8462e4a1f3791 100644
--- a/clang/lib/CodeGen/CGCleanup.cpp
+++ b/clang/lib/CodeGen/CGCleanup.cpp
@@ -17,6 +17,7 @@
 
//===----------------------------------------------------------------------===//
 
 #include "CGCleanup.h"
+#include "CGDebugInfo.h"
 #include "CodeGenFunction.h"
 #include "llvm/Support/SaveAndRestore.h"
 
@@ -1118,6 +1119,8 @@ void CodeGenFunction::EmitBranchThroughCleanup(JumpDest 
Dest) {
 
   // Create the branch.
   llvm::BranchInst *BI = Builder.CreateBr(Dest.getBlock());
+  // This is the primary instruction for this atom, acting as a ret.
+  addInstToCurrentSourceAtom(BI, nullptr);
 
   // Calculate the innermost active normal cleanup.
   EHScopeStack::stable_iterator
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp 
b/clang/lib/CodeGen/CGDebugInfo.cpp
index 0e6daa42ee7bf..8bf5d1418f431 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -43,6 +43,7 @@
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/Instruction.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/Metadata.h"
@@ -52,6 +53,7 @@
 #include "llvm/Support/SHA1.h"
 #include "llvm/Support/SHA256.h"
 #include "llvm/Support/TimeProfiler.h"
+#include <cstdint>
 #include <optional>
 using namespace clang;
 using namespace clang::CodeGen;
@@ -119,6 +121,144 @@ CGDebugInfo::~CGDebugInfo() {
          "Region stack mismatch, stack not empty!");
 }
 
+void CGDebugInfo::addInstSourceAtomMetadata(llvm::Instruction *I,
+                                            uint64_t Group, uint8_t Rank) {
+  if (!I->getDebugLoc() || Group == 0 || !I->getDebugLoc()->getLine())
+    return;
+
+  // Saturate the 3-bit rank.
+  Rank = std::min<uint8_t>(Rank, 7);
+
+  const llvm::DebugLoc &DL = I->getDebugLoc();
+
+  // Each instruction can only be attributed to one source atom (as an
+  // limitation of the implementation). If this instruction is already
+  // part of a source atom, pick the group in which it has highest
+  // precedence (lowest rank).
+  // TODO(OCH): Is there a better way to handle merging? Ideally we'd like
+  // to be able to have it attributed to both atoms.
+  if (DL.get()->getAtomGroup() && DL.get()->getAtomRank() &&
+      DL.get()->getAtomRank() < Rank) {
+    Group = DL.get()->getAtomGroup();
+    Rank = DL.get()->getAtomRank();
+  }
+
+  // Update the watermark to indicate this Group ID has been used
+  // in this function.
+  HighestEmittedAtomGroup = std::max(Group, HighestEmittedAtomGroup);
+
+  llvm::DILocation *NewDL = llvm::DILocation::get(
+      I->getContext(), DL.getLine(), DL.getCol(), DL.getScope(),
+      DL.getInlinedAt(), DL.isImplicitCode(), Group, Rank);
+  I->setDebugLoc(NewDL);
+};
+
+void CGDebugInfo::addInstToCurrentSourceAtom(llvm::Instruction *KeyInstruction,
+                                             llvm::Value *Backup,
+                                             uint8_t KeyInstRank) {
+  /* TODO(OCH):
+  if (key-instructions-is-not-enabled)
+    return;
+  */
+  uint64_t Group = CurrentAtomGroup;
+  if (!Group)
+    return;
+
+  KeyInstRank += CurrentAtomRankBase;
+  addInstSourceAtomMetadata(KeyInstruction, Group, KeyInstRank);
+
+  llvm::Instruction *BackupI =
+      llvm::dyn_cast_or_null<llvm::Instruction>(Backup);
+  if (!BackupI)
+    return;
+
+  // Add the backup instruction to the group.
+  addInstSourceAtomMetadata(BackupI, Group, /*Rank*/ ++KeyInstRank);
+
+  // Look through chains of casts too, as they're probably going to evaporate.
+  // FIXME(OCH): And other nops like zero length geps?
+  // FIXME(OCH): Should use Cast->isNoopCast()?
+  while (auto *Cast = dyn_cast<llvm::CastInst>(BackupI)) {
+    BackupI = dyn_cast<llvm::Instruction>(Cast->getOperand(0));
+    if (!BackupI)
+      break;
+    addInstSourceAtomMetadata(BackupI, Group, ++KeyInstRank);
+  }
+}
+
+void CGDebugInfo::addRetToOverrideOrNewSourceAtom(llvm::ReturnInst *Ret,
+                                                  llvm::Value *Backup,
+                                                  uint8_t KeyInstRank) {
+  if (RetAtomGroupOverride) {
+    uint64_t CurGrp = CurrentAtomGroup;
+    CurrentAtomGroup = RetAtomGroupOverride;
+    addInstToCurrentSourceAtom(Ret, Backup, KeyInstRank);
+    CurrentAtomGroup = CurGrp;
+    RetAtomGroupOverride = 0;
+  } else {
+    auto Grp = ApplyAtomGroup(this);
+    addInstToCurrentSourceAtom(Ret, Backup, KeyInstRank);
+  }
+}
+
+void CGDebugInfo::setRetInstSourceAtomOverride(uint64_t Group) {
+  assert(RetAtomGroupOverride == 0);
+  RetAtomGroupOverride = Group;
+}
+
+void CGDebugInfo::completeFunction() {
+  // Atoms are identified by a {AtomGroup, InlinedAt} pair, meaning AtomGroup
+  // numbers can be repeated across different functions. LLVM optimisations may
+  // need to assign new AtomGroups. In order to guarentee that those future
+  // transformations keep the numbers within functions unique, we just need to
+  // track the highest number used across all functions.
+  CGM.getLLVMContext().updateAtomGroupWaterline(NextAtomGroup);
+  NextAtomGroup = 1;
+  HighestEmittedAtomGroup = 0;
+  CurrentAtomGroup = 0;
+  RetAtomGroupOverride = 0;
+  CurrentAtomRankBase = 0;
+}
+
+ApplyAtomGroup::ApplyAtomGroup(CodeGenFunction &CGF)
+    : ApplyAtomGroup(CGF.getDebugInfo()) {}
+
+ApplyAtomGroup::ApplyAtomGroup(CGDebugInfo *DI) : DI(DI) {
+  if (!DI)
+    return;
+  OriginalAtomGroup = DI->CurrentAtomGroup;
+  DI->CurrentAtomGroup = DI->NextAtomGroup++;
+  // Reset rank-base as it should only apply to the group it was added to.
+  OriginalRankBase = DI->CurrentAtomRankBase;
+  DI->CurrentAtomRankBase = 0;
+}
+
+ApplyAtomGroup::~ApplyAtomGroup() {
+  if (!DI)
+    return;
+  if (DI->HighestEmittedAtomGroup < DI->NextAtomGroup - 1)
+    DI->NextAtomGroup = DI->HighestEmittedAtomGroup + 1;
+  DI->CurrentAtomGroup = OriginalAtomGroup;
+
+  DI->CurrentAtomRankBase = OriginalRankBase;
+}
+
+IncAtomRank::IncAtomRank(CodeGenFunction &CGF)
+    : IncAtomRank(CGF.getDebugInfo()) {}
+
+IncAtomRank::IncAtomRank(CGDebugInfo *DI) : DI(DI) {
+  if (!DI)
+    return;
+  ++DI->CurrentAtomRankBase;
+}
+
+IncAtomRank::~IncAtomRank() {
+  if (!DI)
+    return;
+  assert(DI->CurrentAtomRankBase);
+  --DI->CurrentAtomRankBase;
+}
+
 ApplyDebugLocation::ApplyDebugLocation(CodeGenFunction &CGF,
                                        SourceLocation TemporaryLocation)
     : CGF(&CGF) {
@@ -174,8 +314,15 @@ ApplyDebugLocation::ApplyDebugLocation(CodeGenFunction 
&CGF, llvm::DebugLoc Loc)
     return;
   }
   OriginalLocation = CGF.Builder.getCurrentDebugLocation();
-  if (Loc)
+  if (Loc) {
+    // Key Instructions: drop the atom group and rank to avoid accidentally
+    // propagating it around.
+    if (Loc->getAtomGroup())
+      Loc = llvm::DILocation::get(Loc->getContext(), Loc.getLine(),
+                                  Loc->getColumn(), Loc->getScope(),
+                                  Loc->getInlinedAt(), Loc.isImplicitCode());
     CGF.Builder.SetCurrentDebugLocation(std::move(Loc));
+  }
 }
 
 ApplyDebugLocation::~ApplyDebugLocation() {
diff --git a/clang/lib/CodeGen/CGDebugInfo.h b/clang/lib/CodeGen/CGDebugInfo.h
index 38f73eca561b7..997cf0e3db215 100644
--- a/clang/lib/CodeGen/CGDebugInfo.h
+++ b/clang/lib/CodeGen/CGDebugInfo.h
@@ -58,6 +58,9 @@ class CGBlockInfo;
 class CGDebugInfo {
   friend class ApplyDebugLocation;
   friend class SaveAndRestoreLocation;
+  friend class ApplyAtomGroup;
+  friend class IncAtomRank;
+
   CodeGenModule &CGM;
   const llvm::codegenoptions::DebugInfoKind Debu...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/130943
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [not for merge][RFC] Key Instructions front end demo (PR #130943)

Reply via email to