[PATCH] D138702: support for HIP non hostcall printf

2022-11-25 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH created this revision.
vikramRH added reviewers: sameerds, b-sumner, yaxunl.
vikramRH added a project: LLVM.
Herald added subscribers: kosarev, foad, kerbowa, hiraditya, Anastasia, 
jvesely, arsenm.
Herald added a project: All.
vikramRH requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, MaskRay.
Herald added a project: clang.

The patch essentially ports the existing non-hostcall printf support in OpenCL 
to HIP (to be controlled via new option "-fdelayed-printf"), with following 
changes

1. Code refactoring -> we now use API's "getConstantStringInfo()" to extract 
constant string contents at compile time.
2. Support to print non-const null terminated strings, required in HIP context. 
This is achieved as follows
  - calculate string size using a function "getStrlenWithNull()" and reserve 
the space in printf buffer using __printf_alloc() (as was the case in OpenCL, 
but number of bytes allocated could be dynamic now)
  - copy the string contents to buffer using previously calculated size and the 
pointer to string (a memcpy intrinsic is generated here)


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D138702

Files:
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Driver/Options.td
  clang/lib/Basic/Builtins.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
  llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll
  llvm/test/CodeGen/AMDGPU/opencl-printf.ll

Index: llvm/test/CodeGen/AMDGPU/opencl-printf.ll
===
--- llvm/test/CodeGen/AMDGPU/opencl-printf.ll
+++ llvm/test/CodeGen/AMDGPU/opencl-printf.ll
@@ -9,18 +9,17 @@
 ; R600: call i32 (i8 addrspace(2)*, ...) @printf(i8 addrspace(2)* getelementptr inbounds ([6 x i8], [6 x i8] addrspace(2)* @.str, i32 0, i32 0), i8* %arraydecay, i32 %n)
 ; GCN-LABEL: entry
 ; GCN: call i8 addrspace(1)* @__printf_alloc
-; GCN-LABEL: entry.split
+; GCN-LABEL: strlen.join.split
 ; GCN: icmp ne i8 addrspace(1)* %printf_alloc_fn, null
+; GCN: br i1 %14, label %15, label %16
 ; GCN: %PrintBuffID = getelementptr i8, i8 addrspace(1)* %printf_alloc_fn, i32 0
 ; GCN: %PrintBuffIdCast = bitcast i8 addrspace(1)* %PrintBuffID to i32 addrspace(1)*
-; GCN: store i32 1, i32 addrspace(1)* %PrintBuffIdCast
+; GCN: store i32 1, i32 addrspace(1)* %PrintBuffIdCast, align 4
 ; GCN: %PrintBuffGep = getelementptr i8, i8 addrspace(1)* %printf_alloc_fn, i32 4
-; GCN: %PrintArgPtr = ptrtoint i8* %arraydecay to i64
-; GCN: %PrintBuffPtrCast = bitcast i8 addrspace(1)* %PrintBuffGep to i64 addrspace(1)*
-; GCN: store i64 %PrintArgPtr, i64 addrspace(1)* %PrintBuffPtrCast
-; GCN: %PrintBuffNextPtr = getelementptr i8, i8 addrspace(1)* %PrintBuffGep, i32 8
-; GCN: %PrintBuffPtrCast1 = bitcast i8 addrspace(1)* %PrintBuffNextPtr to i32 addrspace(1)*
-; GCN: store i32 %n, i32 addrspace(1)* %PrintBuffPtrCast1
+; GCN: call void @llvm.memcpy.p1i8.p0i8.i64(i8 addrspace(1)* %PrintBuffGep, i8* %arraydecay, i64 %9, i1 false)
+; GCN: %PrintBuffNextPtr = getelementptr i8, i8 addrspace(1)* %PrintBuffGep, i64 %11
+; GCN: %PrintBuffPtrCast = bitcast i8 addrspace(1)* %PrintBuffNextPtr to i32 addrspace(1)*
+; GCN: store i32 %n, i32 addrspace(1)* %PrintBuffPtrCast, align 4
 
 @.str = private unnamed_addr addrspace(2) constant [6 x i8] c"%s:%d\00", align 1
 
Index: llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll
===
--- /dev/null
+++ llvm/test/CodeGen/AMDGPU/hip-delayed-printf.ll
@@ -0,0 +1,57 @@
+; RUN: opt -mtriple=amdgcn--amdhsa -passes=amdgpu-printf-runtime-binding -S < %s | FileCheck --check-prefix=FUNC --check-prefix=GCN --check-prefix=METADATA %s
+
+; FUNC-LABEL: @test_kernel(
+; GCN-LABEL: entry
+; GCN-LABEL: strlen.while
+; GCN: br i1 %6, label %strlen.while.done, label %strlen.while
+; GCN-LABEL: strlen.join
+; GCN: %12 = add i64 %11, 3
+; GCN: %13 = and i64 %12, 4294967292
+; GCN: %14 = add i64 %13, 4
+; GCN: %15 = trunc i64 %14 to i32
+; GCN: %printf_alloc_fn = call ptr addrspace(1) @__printf_alloc(i32 %15)
+; GCN-LABEL: strlen.join.split
+; GCN: %16 = icmp ne ptr addrspace(1) %printf_alloc_fn, null
+; GCN: br i1 %16, label %17, label %18
+; GCN: %PrintBuffID = getelementptr i8, ptr addrspace(1) %printf_alloc_fn, i32 0
+; GCN: %PrintBuffIdCast = bitcast ptr addrspace(1) %PrintBuffID to ptr addrspace(1)
+; GCN: store i32 1, ptr addrspace(1) %PrintBuffIdCast, align 4
+; GCN: %PrintBuffGep = getelementptr i8, ptr addrspace(1) %printf_alloc_fn, i32 4
+; GCN: call void @llvm.memcpy.p1.p0.i64(ptr addrspace(1) %PrintBuffGep, ptr %1, i64 %11, i1 false)
+; GCN: %PrintBuffNextPtr = getelementptr i8, ptr addrspace(1) %PrintBuffGep, i64 %13
+; GCN: br label %18
+
+; METADATA: !llvm.printf.fmts = !{!0}
+; METADATA: !0 = !{!"1:1:8:%s"}
+
+@.str = private unnamed_addr addrspace(4) constant [3 x i8] c"%s\00", align 1
+@.str.1 = private unnamed_addr addrspace(4) constant [6 x i8] c"hel

[PATCH] D138702: support for HIP non hostcall printf

2022-12-01 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH added a comment.

There were some issues reported where it was found that pcie atomics are not 
available in some environments, as a result hostcall and HIP-Printf were also 
not available. This Patch is intended to provide an alternative in such 
scenarios while hostcall based implementation remains the primary option.

After discussions with Sameer, I am convinced that it is better to do this 
transformation at clang CodeGen phase rather than a opt pass (similar to 
current hostcall based implementation),
Hence this patch needs rework and I shall include the review comments in the 
updated patch. (I agree on the option too, I shall think of a option name with 
a named argument)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138702/new/

https://reviews.llvm.org/D138702

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-23 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 524590.
vikramRH added a comment.

Handled review comments and rebase


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,8 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
 
 using namespace llvm;
 
@@ -179,11 +181,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +205,303 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  std::string Str = "";
+  bool isConst = true;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+
+  StringData(std::string str, bool IC, Value *RS, Value *AS)
+  : Str(str), isConst(IC), RealSize(RS), AlignedSize(AS) {}
+};
+
+static inline size_t alignUp(size_t Value, uint Alignment) {
+  return (Value + Alignment - 1) & ~(Alignment - 1);
+}
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef &Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Value *NonConstStrLen = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+auto LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData("", false, LenWithNull, NonConstStrLen));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignUp(ArgStr.size() + 1, 8);
+StringContents.push_back(
+StringData((ArgStr.str() + '\0'), /*isConst*/ true,
+   /*RealSize*/ nullptr, /*AlignedSize*/ nullptr));
+BufSize += alignedLen;
+  } else {
+auto LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+auto LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", false, LenWithNull, LenWithNullAligned));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getInt64Ty(), BufSize, false);
+  SmallVector Alloc_args;
+  if (NonConstStrLen)
+SizeToReserve = Builder.CreateAdd(NonConstStrLen, SizeToReserve);
+
+  ArgSize = Builder.CreateTrunc(SizeToReserve, Builder.getInt32Ty());
+  Alloc_args.push_back(ArgSize);
+
+  // call the printf_alloc function
+  AttributeList Attr = Attr

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-24 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 525082.
vikramRH added a comment.

Handled review comments and rebase


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,298 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  std::string Str = "";
+  bool isConst = true;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+
+  StringData(std::string str, bool IC, Value *RS, Value *AS)
+  : Str(str), isConst(IC), RealSize(RS), AlignedSize(AS) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef &Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Value *NonConstStrLen = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+auto LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData("", false, LenWithNull, NonConstStrLen));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(
+StringData((ArgStr.str() + '\0'), /*isConst*/ true,
+   /*RealSize*/ nullptr, /*AlignedSize*/ nullptr));
+BufSize += alignedLen;
+  } else {
+auto LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+auto LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", false, LenWithNull, LenWithNullAligned));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getInt64Ty(), BufSize, false);
+  SmallVector Alloc_args;
+  if (NonConstStrLen)
+SizeToReserve = Builder.CreateAdd(NonConstStrLen, SizeToReserve);
+
+  ArgSize = Builder.CreateTrunc(SizeToReserve, Builder.getInt32Ty());
+  Alloc_args.push_back(ArgSize);
+
+  // call the printf_alloc function
+  AttributeList Attr = AttributeList::get(
+  Builder.getContext(), AttributeList::FunctionIndex, Attribut

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-25 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 525946.
vikramRH added a comment.

rebase and ping, apologies for the short frequency.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,298 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  std::string Str = "";
+  bool isConst = true;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+
+  StringData(std::string str, bool IC, Value *RS, Value *AS)
+  : Str(str), isConst(IC), RealSize(RS), AlignedSize(AS) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef &Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Value *NonConstStrLen = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+auto LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData("", false, LenWithNull, NonConstStrLen));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(
+StringData((ArgStr.str() + '\0'), /*isConst*/ true,
+   /*RealSize*/ nullptr, /*AlignedSize*/ nullptr));
+BufSize += alignedLen;
+  } else {
+auto LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+auto LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", false, LenWithNull, LenWithNullAligned));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getInt64Ty(), BufSize, false);
+  SmallVector Alloc_args;
+  if (NonConstStrLen)
+SizeToReserve = Builder.CreateAdd(NonConstStrLen, SizeToReserve);
+
+  ArgSize = Builder.CreateTrunc(SizeToReserve, Builder.getInt32Ty());
+  Alloc_args.push_back(ArgSize);
+
+  // call the printf_alloc function
+  AttributeList Attr = AttributeList::get(
+  Builder.getContext(), AttributeList::Functi

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-31 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 526963.
vikramRH added a comment.

Handled most review comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,305 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  std::string Str;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+  bool isConst = true;
+
+  StringData(std::string str, Value *RS, Value *AS, bool IC)
+  : Str(str), RealSize(RS), AlignedSize(AS), isConst(IC) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Module *M = Builder.GetInsertBlock()->getModule();
+  Value *NonConstStrLen = nullptr;
+  Value *LenWithNull = nullptr;
+  Value *LenWithNullAligned = nullptr;
+  Value *TempAdd = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(LenWithNull,
+ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData("", LenWithNull, NonConstStrLen, false));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(
+(ArgStr.str() + '\0'),
+/*RealSize*/ nullptr, /*AlignedSize*/ nullptr, /*isConst*/ true));
+BufSize += alignedLen;
+  } else {
+LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", LenWithNull, LenWithNullAligned, false));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getInt64Ty(), BufSize, false);
+  SmallVector Alloc_args;
+  if (NonConstStrLen)
+SizeToReserve = Builder.CreateAdd(NonConstStrLen, SizeToReserve);
+
+  ArgSize = Builder.CreateTrunc(SizeToReserve, Builder.getInt32Ty());
+  Alloc_args.push_back(ArgSize);
+
+  // call the pr

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-31 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH marked 11 inline comments as done.
vikramRH added inline comments.



Comment at: clang/test/CodeGenHIP/printf_nonhostcall.cpp:64
+// CHECK-NEXT:[[PRINTBUFFNEXTPTR4:%.*]] = getelementptr inbounds i8, ptr 
addrspace(1) [[PRINTBUFFNEXTPTR3]], i64 [[TMP13]]
+// CHECK-NEXT:[[TMP21:%.*]] = ptrtoint ptr [[TMP1]] to i64
+// CHECK-NEXT:store i64 [[TMP21]], ptr addrspace(1) [[PRINTBUFFNEXTPTR4]], 
align 8

arsenm wrote:
> Can directly store the pointer, don't ptrtoint? Should have some other 
> pointer address spaces tested, the spec is quiet on how %p is supposed to 
> work with different sized pointers 
Added few more address space cases in foo2(), do you think we need more ?



Comment at: clang/test/CodeGenHIP/printf_nonhostcall.cpp:228
+  return printf(s, 10);
+}

arsenm wrote:
> Need some vector tests, especially 3 x vectors 
vectors are not yet supported as in the hostcall case since the code currently 
runs for HIP only. I plan to port the OpenCL lowering also here along with 
hostcall support . we would need to extend hostcall and this implementation to 
support vectors as part of the porting activity.



Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:211
+struct StringData {
+  std::string Str = "";
+  bool isConst = true;

arsenm wrote:
> arsenm wrote:
> > Don't need = ""
> Can't you just use the raw StringRef out of getConstantStringInfo
The structure just caches the string calculated from getConstantStringInfo



Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:381
+} else {
+  if (Args[i]->getType()->isDoubleTy())
+WhatToStore.push_back(Args[i]);

arsenm wrote:
> Don't see why isDoubleTy would be special cased here and not part of 
> fitArgInto64Bits
The redundant cast instructions and this case were due to function 
"fitArgInto64Bits", which I reused from hostcall Implementation and did not 
want to modify. This dependency has been removed now and I have inlined the code



Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:458
+auto CreateControlDWord = M->getOrInsertFunction(
+StringRef("__ockl_create_control_dword"), Builder.getInt32Ty(),
+Builder.getInt32Ty(), Int1Ty, Int1Ty);

arsenm wrote:
> sameerds wrote:
> > vikramRH wrote:
> > > arsenm wrote:
> > > > vikramRH wrote:
> > > > > arsenm wrote:
> > > > > > Do we really need another ockl control variable for this? Why isn't 
> > > > > > it a parameter? printf=stdout always 
> > > > > There are certain HIP API's such as "hip_assert" that output to 
> > > > > stderr. currently such API's are supported via hostcalls. Although 
> > > > > this implementation does not currently support the API's ,its kept as 
> > > > > an option. 
> > > > Right but the way to handle that would be a parameter for where to 
> > > > output, not an externally set global 
> > > I am not clear here, you expect additional inputs to device lib function ?
> > @arsenm, this "control word" written into the buffer. In that sense, it is 
> > indeed a parameter passed from device to host as part of the printf packet. 
> > It is not yet another global variable.
> I'm not following why this part requires introducing another call into device 
> libs. It's expanding this not-really defined API. I'd expect this to be a 
> trivial function we can expand inline here and write directly into the 
> parameter buffer?
I started the implementation keeping in mind the device side asserts currenlty 
supported via hostcall. They use explicit calls to such device lib functions in 
their definitions. I could not support them now but these would be required if 
I ever decide to. also I never ended up inlining this call. do you think its 
really necessary ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-31 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH marked 2 inline comments as done.
vikramRH added inline comments.



Comment at: clang/include/clang/Driver/Options.td:1030
   NegFlag>;
+def mprintf_kind_EQ : Joined<["-"], "mprintf-kind=">, Group,
+  HelpText<"Specify the printf lowering scheme (AMDGPU only), allowed values 
are "

arsenm wrote:
> I'm a bit worried this is introducing a side-ABI option not captured in the 
> triple or at least module flags 
I did not understand the concern here. I agree The usage of option itself is 
not the most robust solution and it would have been better if we could do this 
without user intervention. but I do not see ways to do that now (atleast not 
immediately).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-01 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 527462.
vikramRH added a comment.
Herald added a subscriber: jdoerfert.

Few additional changes,

1. reflect printf-kind in module flags metadata
2. Test cases for the change


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/CodeGenHIP/printf-kind-module-flag.hip
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/CodeGenHIP/sanitize-undefined-null.hip
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,305 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  std::string Str;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+  bool isConst = true;
+
+  StringData(std::string str, Value *RS, Value *AS, bool IC)
+  : Str(str), RealSize(RS), AlignedSize(AS), isConst(IC) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Module *M = Builder.GetInsertBlock()->getModule();
+  Value *NonConstStrLen = nullptr;
+  Value *LenWithNull = nullptr;
+  Value *LenWithNullAligned = nullptr;
+  Value *TempAdd = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(LenWithNull,
+ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData("", LenWithNull, NonConstStrLen, false));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(
+(ArgStr.str() + '\0'),
+/*RealSize*/ nullptr, /*AlignedSize*/ nullptr, /*isConst*/ true));
+BufSize += alignedLen;
+  } else {
+LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", LenWithNull, LenWithNullAligned, false));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = Con

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-01 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 527472.
vikramRH added a comment.

new line at the end of test file


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/CodeGenHIP/printf-kind-module-flag.hip
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/CodeGenHIP/sanitize-undefined-null.hip
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,305 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  std::string Str;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+  bool isConst = true;
+
+  StringData(std::string str, Value *RS, Value *AS, bool IC)
+  : Str(str), RealSize(RS), AlignedSize(AS), isConst(IC) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Module *M = Builder.GetInsertBlock()->getModule();
+  Value *NonConstStrLen = nullptr;
+  Value *LenWithNull = nullptr;
+  Value *LenWithNullAligned = nullptr;
+  Value *TempAdd = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(LenWithNull,
+ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData("", LenWithNull, NonConstStrLen, false));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(
+(ArgStr.str() + '\0'),
+/*RealSize*/ nullptr, /*AlignedSize*/ nullptr, /*isConst*/ true));
+BufSize += alignedLen;
+  } else {
+LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", LenWithNull, LenWithNullAligned, false));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getInt64Ty(), BufSize, false);
+  SmallVector Alloc_args;
+  if (NonConstStrLen)
+   

[PATCH] D138702: support for HIP non hostcall printf

2023-02-10 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH abandoned this revision.
vikramRH added a comment.

A new implentation is under discussion


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138702/new/

https://reviews.llvm.org/D138702

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-11 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH created this revision.
vikramRH added reviewers: sameerds, b-sumner, yaxunl, arsenm.
Herald added subscribers: hoy, kerbowa, hiraditya, Anastasia, tpr, dstuttard, 
jvesely, kzhuravl.
Herald added a project: All.
vikramRH requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, MaskRay, wdng.
Herald added projects: clang, LLVM.

This is an alternative to currently existing hostcall implementation and uses 
printf buffer similar to OpenCL,
The data stored in the buffer (i.e the data frame) for each printf call are as 
follows,

1. Control DWord - contains info regarding stream, format string constness and 
size of data frame
2. Hash of the format string (if constant) else the format string itself
3. Printf arguments (each aligned to 8 byte boundary)

The format string Hash is generated using LLVM's MD5 Message-Digest Algorithm 
implementation and only low 64 bits are used.
The implementation still uses amdhsa metadata and hash is stored as part of 
format string itself to ensure
minimal changes in runtime.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/LangOptions.def
  clang/include/clang/Basic/LangOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,8 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
 
 using namespace llvm;
 
@@ -179,9 +181,8 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
+static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt, StringRef &FmtStr) {
+  if (!getConstantStringInfo(Fmt, FmtStr) || FmtStr.empty())
 return;
 
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
@@ -189,17 +190,17 @@
   // Skip the first argument, the format string.
   unsigned ArgIdx = 1;
 
-  while ((SpecPos = Str.find_first_of('%', SpecPos)) != StringRef::npos) {
-if (Str[SpecPos + 1] == '%') {
+  while ((SpecPos = FmtStr.find_first_of('%', SpecPos)) != StringRef::npos) {
+if (FmtStr[SpecPos + 1] == '%') {
   SpecPos += 2;
   continue;
 }
-auto SpecEnd = Str.find_first_of(ConvSpecifiers, SpecPos);
+auto SpecEnd = FmtStr.find_first_of(ConvSpecifiers, SpecPos);
 if (SpecEnd == StringRef::npos)
   return;
-auto Spec = Str.slice(SpecPos, SpecEnd + 1);
+auto Spec = FmtStr.slice(SpecPos, SpecEnd + 1);
 ArgIdx += Spec.count('*');
-if (Str[SpecEnd] == 's') {
+if (FmtStr[SpecEnd] == 's') {
   BV.set(ArgIdx);
 }
 SpecPos = SpecEnd + 1;
@@ -207,14 +208,312 @@
   }
 }
 
+// helper struct to package the string related data
+typedef struct S {
+  std::string Str;
+  bool isConst;
+  Value *RealSize;
+  Value *AlignedSize;
+
+  S(std::string str = "", bool IC = true, Value *RS = nullptr,
+Value *AS = nullptr)
+  : Str(str), isConst(IC), RealSize(RS), AlignedSize(AS) {}
+} StringData;
+
+static size_t alignUp(size_t Value, uint alignment) {
+  return (Value + alignment - 1) & ~(alignment - 1);
+}
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(IRBuilder<> &Builder,
+ ArrayRef &Args, Value *Fmt,
+ StringRef &FmtStr,
+ SparseBitVector<8> &SpecIsCString,
+ SmallVector &StringContents,
+ Value *&ArgSize) {
+  Value *NonConstStrLen = nullptr;
+
+  // First 8 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (!FmtStr.empty())
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+auto LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-17 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 522954.
vikramRH added a comment.



1. Updating D150427 : [AMDGPU] Non hostcall 
printf support for HIP #
2. Enter a brief description of the changes included in this update.
3. The first line is used as subject, next lines as comment. #
4. If you intended to create a new revision, use:
5. $ arc diff --create

Handled review comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,8 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
 
 using namespace llvm;
 
@@ -179,9 +181,8 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
+static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt, StringRef &FmtStr) {
+  if (!getConstantStringInfo(Fmt, FmtStr) || FmtStr.empty())
 return;
 
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
@@ -189,17 +190,17 @@
   // Skip the first argument, the format string.
   unsigned ArgIdx = 1;
 
-  while ((SpecPos = Str.find_first_of('%', SpecPos)) != StringRef::npos) {
-if (Str[SpecPos + 1] == '%') {
+  while ((SpecPos = FmtStr.find_first_of('%', SpecPos)) != StringRef::npos) {
+if (FmtStr[SpecPos + 1] == '%') {
   SpecPos += 2;
   continue;
 }
-auto SpecEnd = Str.find_first_of(ConvSpecifiers, SpecPos);
+auto SpecEnd = FmtStr.find_first_of(ConvSpecifiers, SpecPos);
 if (SpecEnd == StringRef::npos)
   return;
-auto Spec = Str.slice(SpecPos, SpecEnd + 1);
+auto Spec = FmtStr.slice(SpecPos, SpecEnd + 1);
 ArgIdx += Spec.count('*');
-if (Str[SpecEnd] == 's') {
+if (FmtStr[SpecEnd] == 's') {
   BV.set(ArgIdx);
 }
 SpecPos = SpecEnd + 1;
@@ -207,14 +208,300 @@
   }
 }
 
+// helper struct to package the string related data
+typedef struct S {
+  std::string Str;
+  bool isConst;
+  Value *RealSize;
+  Value *AlignedSize;
+
+  S(std::string str = "", bool IC = true, Value *RS = nullptr,
+Value *AS = nullptr)
+  : Str(str), isConst(IC), RealSize(RS), AlignedSize(AS) {}
+} StringData;
+
+static size_t alignUp(size_t Value, uint alignment) {
+  return (Value + alignment - 1) & ~(alignment - 1);
+}
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(IRBuilder<> &Builder,
+ ArrayRef &Args, Value *Fmt,
+ StringRef &FmtStr,
+ SparseBitVector<8> &SpecIsCString,
+ SmallVector &StringContents,
+ Value *&ArgSize) {
+  Value *NonConstStrLen = nullptr;
+
+  // First 8 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (!FmtStr.empty())
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+auto LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData("", false, LenWithNull, NonConstStrLen));
+  }
+
+  StringRef ArgStr;
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignUp(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(ArgStr.str() + '\0'));
+BufSize += alignedLen;
+  } else {
+auto LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(Le

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-17 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH added a comment.

In D150427#4338334 , @jhuber6 wrote:

> Where does the runtime implementation of this live? I'm not very familiar 
> with the HIP / hostcall ecosystem.

Currently there is a rocclr review up internally for the runtime side support


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-17 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH added inline comments.



Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:458
+auto CreateControlDWord = M->getOrInsertFunction(
+StringRef("__ockl_create_control_dword"), Builder.getInt32Ty(),
+Builder.getInt32Ty(), Int1Ty, Int1Ty);

arsenm wrote:
> Do we really need another ockl control variable for this? Why isn't it a 
> parameter? printf=stdout always 
There are certain HIP API's such as "hip_assert" that output to stderr. 
currently such API's are supported via hostcalls. Although this implementation 
does not currently support the API's ,its kept as an option. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-18 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 523344.
vikramRH added a comment.

Hanldles Review comments from @sameerds


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,8 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
 
 using namespace llvm;
 
@@ -179,10 +181,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
+static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt, StringRef &Str) {
 
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
@@ -207,14 +206,308 @@
   }
 }
 
+// helper struct to package the string related data
+typedef struct StringData {
+  std::string Str = "";
+  bool isConst = true;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+
+  StringData(std::string str, bool IC, Value *RS,
+Value *AS)
+  : Str(str), isConst(IC), RealSize(RS), AlignedSize(AS) {}
+} StringData;
+
+static inline size_t alignUp(size_t Value, uint Alignment) {
+  return (Value + Alignment - 1) & ~(Alignment - 1);
+}
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(IRBuilder<> &Builder,
+ ArrayRef &Args, Value *Fmt,
+ bool isConstFmtStr,
+ SparseBitVector<8> &SpecIsCString,
+ SmallVectorImpl &StringContents,
+ Value *&ArgSize) {
+  Value *NonConstStrLen = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+auto LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData("", false, LenWithNull, NonConstStrLen));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignUp(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData((ArgStr.str() + '\0'), /*isConst*/true,
+/*RealSize*/nullptr, /*AlignedSize*/nullptr));
+BufSize += alignedLen;
+  } else {
+auto LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+auto TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+auto LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", false, LenWithNull, LenWithNullAligned));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getInt64Ty(), BufSize, false);
+  SmallVector Alloc_args;
+  if (NonConstStrLen)
+SizeToReserve = Builder.CreateAdd(NonConstStrLen, SizeToReserve);
+
+  ArgSize = Builder.CreateTrunc(SizeToReserve, Builder.getInt32Ty());
+  Alloc_args.push_back(ArgSize);
+
+  // call the p

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-18 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH added inline comments.



Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:426-427
+
+// The buffered version still follows OpenCL printf standards for
+// printf return value, i.e 0 on success, 1 on failure.
+ConstantPointerNull *zeroIntPtr =

sameerds wrote:
> So we cannot use a buffered printf in HIP?
I guess hostcall version returns number of bytes printed as the printf return 
value, This is not done in Buffered case since we do not print stuff in "real 
time", instead I have decided to stick with OpenCL standard (return 0 on 
success and -1 on failure)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-18 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH added inline comments.



Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:458
+auto CreateControlDWord = M->getOrInsertFunction(
+StringRef("__ockl_create_control_dword"), Builder.getInt32Ty(),
+Builder.getInt32Ty(), Int1Ty, Int1Ty);

arsenm wrote:
> vikramRH wrote:
> > arsenm wrote:
> > > Do we really need another ockl control variable for this? Why isn't it a 
> > > parameter? printf=stdout always 
> > There are certain HIP API's such as "hip_assert" that output to stderr. 
> > currently such API's are supported via hostcalls. Although this 
> > implementation does not currently support the API's ,its kept as an option. 
> Right but the way to handle that would be a parameter for where to output, 
> not an externally set global 
I am not clear here, you expect additional inputs to device lib function ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-05 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 528407.
vikramRH added a comment.

Handled few more review comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/CodeGenHIP/printf-kind-module-flag.hip
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/CodeGenHIP/sanitize-undefined-null.hip
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,302 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  StringRef Str;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+  bool IsConst = true;
+
+  StringData(StringRef ST, Value *RS, Value *AS, bool IC)
+  : Str(ST), RealSize(RS), AlignedSize(AS), IsConst(IC) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Module *M = Builder.GetInsertBlock()->getModule();
+  Value *NonConstStrLen = nullptr;
+  Value *LenWithNull = nullptr;
+  Value *LenWithNullAligned = nullptr;
+  Value *TempAdd = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(LenWithNull,
+ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData(StringRef(), LenWithNull, NonConstStrLen, false));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(
+ArgStr,
+/*RealSize*/ nullptr, /*AlignedSize*/ nullptr, /*IsConst*/ true));
+BufSize += alignedLen;
+  } else {
+LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", LenWithNull, LenWithNullAligned, false));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getInt64Ty(), BufSize, false);
+  SmallVector Alloc_args;
+  if (NonConstStrLen)
+SizeToReser

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-05 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH marked 4 inline comments as done.
vikramRH added inline comments.



Comment at: clang/test/CodeGenHIP/printf_nonhostcall.cpp:228
+  return printf(s, 10);
+}

arsenm wrote:
> vikramRH wrote:
> > arsenm wrote:
> > > Need some vector tests, especially 3 x vectors 
> > vectors are not yet supported as in the hostcall case since the code 
> > currently runs for HIP only. I plan to port the OpenCL lowering also here 
> > along with hostcall support . we would need to extend hostcall and this 
> > implementation to support vectors as part of the porting activity.
> Something still needs to happen for vectors, it can't just crash. The 
> different types of vectors may appear regardless of language 
I have forced "-Werror=format-invalid-specifier" with printf option which 
should be a sufficient condition for vectors I guess ? (this causes clang to 
break out earlier with an error)



Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:458
+auto CreateControlDWord = M->getOrInsertFunction(
+StringRef("__ockl_create_control_dword"), Builder.getInt32Ty(),
+Builder.getInt32Ty(), Int1Ty, Int1Ty);

arsenm wrote:
> vikramRH wrote:
> > arsenm wrote:
> > > sameerds wrote:
> > > > vikramRH wrote:
> > > > > arsenm wrote:
> > > > > > vikramRH wrote:
> > > > > > > arsenm wrote:
> > > > > > > > Do we really need another ockl control variable for this? Why 
> > > > > > > > isn't it a parameter? printf=stdout always 
> > > > > > > There are certain HIP API's such as "hip_assert" that output to 
> > > > > > > stderr. currently such API's are supported via hostcalls. 
> > > > > > > Although this implementation does not currently support the API's 
> > > > > > > ,its kept as an option. 
> > > > > > Right but the way to handle that would be a parameter for where to 
> > > > > > output, not an externally set global 
> > > > > I am not clear here, you expect additional inputs to device lib 
> > > > > function ?
> > > > @arsenm, this "control word" written into the buffer. In that sense, it 
> > > > is indeed a parameter passed from device to host as part of the printf 
> > > > packet. It is not yet another global variable.
> > > I'm not following why this part requires introducing another call into 
> > > device libs. It's expanding this not-really defined API. I'd expect this 
> > > to be a trivial function we can expand inline here and write directly 
> > > into the parameter buffer?
> > I started the implementation keeping in mind the device side asserts 
> > currenlty supported via hostcall. They use explicit calls to such device 
> > lib functions in their definitions. I could not support them now but these 
> > would be required if I ever decide to. also I never ended up inlining this 
> > call. do you think its really necessary ?
> But that implementation is in the library itself? What does the function 
> actually do?
> 
> I think we're better off the more defined/documented ABI we have in terms of 
> size-and-offset from base pointer than call into unstable library calls.
> 
> Can we get some documentation on the ABI in AMDGPUUsage?
The library function itself just handles creation of controlDWord via minor bit 
manipulations. nevertheless Inlining is pretty simple and im fine either way. 
Call has been expanded.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-07 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 529207.
vikramRH added a comment.

Further review comments


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/CodeGenHIP/printf-kind-module-flag.hip
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/CodeGenHIP/sanitize-undefined-null.hip
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,304 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  StringRef Str;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+  bool IsConst = true;
+
+  StringData(StringRef ST, Value *RS, Value *AS, bool IC)
+  : Str(ST), RealSize(RS), AlignedSize(AS), IsConst(IC) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Module *M = Builder.GetInsertBlock()->getModule();
+  Value *NonConstStrLen = nullptr;
+  Value *LenWithNull = nullptr;
+  Value *LenWithNullAligned = nullptr;
+  Value *TempAdd = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(LenWithNull,
+ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData(StringRef(), LenWithNull, NonConstStrLen, false));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(
+ArgStr,
+/*RealSize*/ nullptr, /*AlignedSize*/ nullptr, /*IsConst*/ true));
+BufSize += alignedLen;
+  } else {
+LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", LenWithNull, LenWithNullAligned, false));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getInt64Ty(), BufSize, false);
+  SmallVector Alloc_args;
+  if (NonConstStr

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-07 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH marked 4 inline comments as done.
vikramRH added inline comments.



Comment at: clang/test/CodeGenHIP/printf_nonhostcall.cpp:137
+
+__device__ float f1 = 3.14f;
+__device__ double f2 = 2.71828;

arsenm wrote:
> Also half 
C++ default arg promotions does not seem to list float16 case and consequently 
clang does not handle it. I have handled this case by promoting to double 
currenlty. Clang however still emits a warning when used with a %f conv 
specifier



Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:386-387
+} else {
+  auto IntTy = dyn_cast(Args[i]->getType());
+  if (IntTy && IntTy->getBitWidth() == 32)
+WhatToStore.push_back(

arsenm wrote:
> isIntegerTy(32).
> 
> I also do not understand why only 32-bit integers would be promoted to 64-bit 
> (or why this would be zext). This doesn't match varargs ABI handling, where 
> everything smaller would be promoted to i32.
Right, But default promotions would have happened already, this additional 
promotion is due to runtime processing requirement


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-07 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 529219.
vikramRH marked an inline comment as done.
vikramRH added a comment.

updated formating


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/CodeGenHIP/printf-kind-module-flag.hip
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/CodeGenHIP/sanitize-undefined-null.hip
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,305 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  StringRef Str;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+  bool IsConst = true;
+
+  StringData(StringRef ST, Value *RS, Value *AS, bool IC)
+  : Str(ST), RealSize(RS), AlignedSize(AS), IsConst(IC) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Module *M = Builder.GetInsertBlock()->getModule();
+  Value *NonConstStrLen = nullptr;
+  Value *LenWithNull = nullptr;
+  Value *LenWithNullAligned = nullptr;
+  Value *TempAdd = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(LenWithNull,
+ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData(StringRef(), LenWithNull, NonConstStrLen, false));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(
+ArgStr,
+/*RealSize*/ nullptr, /*AlignedSize*/ nullptr, /*IsConst*/ true));
+BufSize += alignedLen;
+  } else {
+LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData("", LenWithNull, LenWithNullAligned, false));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getInt64Ty(), BufSize, false);
+  Small

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-07 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH added inline comments.



Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:386-387
+} else {
+  auto IntTy = dyn_cast(Args[i]->getType());
+  if (IntTy && IntTy->getBitWidth() == 32)
+WhatToStore.push_back(

arsenm wrote:
> vikramRH wrote:
> > arsenm wrote:
> > > isIntegerTy(32).
> > > 
> > > I also do not understand why only 32-bit integers would be promoted to 
> > > 64-bit (or why this would be zext). This doesn't match varargs ABI 
> > > handling, where everything smaller would be promoted to i32.
> > Right, But default promotions would have happened already, this additional 
> > promotion is due to runtime processing requirement
> Is this documented somewhere? If it's promote everything to i64, I'd prefer 
> to handle all types rather than special casing 32
That's precisely what the function "fitArgInto64Bits()" did. We eliminated it 
due to some redundant instructions generated. int32 was the only case (apart 
from float16 now) that required this special casting.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-08 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 529551.
vikramRH added a comment.

Reafactored non string arg handling into a seperate function with additional 
asserts


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/CodeGenHIP/printf-kind-module-flag.hip
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/CodeGenHIP/sanitize-undefined-null.hip
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,320 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  StringRef Str;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+  bool IsConst = true;
+
+  StringData(StringRef ST, Value *RS, Value *AS, bool IC)
+  : Str(ST), RealSize(RS), AlignedSize(AS), IsConst(IC) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Module *M = Builder.GetInsertBlock()->getModule();
+  Value *NonConstStrLen = nullptr;
+  Value *LenWithNull = nullptr;
+  Value *LenWithNullAligned = nullptr;
+  Value *TempAdd = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(LenWithNull,
+ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData(StringRef(), LenWithNull, NonConstStrLen, false));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(
+ArgStr,
+/*RealSize*/ nullptr, /*AlignedSize*/ nullptr, /*IsConst*/ true));
+BufSize += alignedLen;
+  } else {
+LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData(StringRef(), LenWithNull, LenWithNullAligned, false));
+  }
+} else
+  // We end up expanding non string arguments to 8 bytes
+  BufSize += 8;
+  }
+
+  // calculate final size value to be passed to printf_alloc
+  Value *SizeToReserve = ConstantInt::get(Builder.getIn

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-09 Thread Vikram Hegde via Phabricator via cfe-commits
vikramRH updated this revision to Diff 529865.
vikramRH added a comment.

Handled last set of review comments from @arsenm, would be willing to handle 
any new findings/concerns via additional patches


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/CodeGenHIP/printf-kind-module-flag.hip
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/CodeGenHIP/sanitize-undefined-null.hip
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,322 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  StringRef Str;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+  bool IsConst = true;
+
+  StringData(StringRef ST, Value *RS, Value *AS, bool IC)
+  : Str(ST), RealSize(RS), AlignedSize(AS), IsConst(IC) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Module *M = Builder.GetInsertBlock()->getModule();
+  Value *NonConstStrLen = nullptr;
+  Value *LenWithNull = nullptr;
+  Value *LenWithNullAligned = nullptr;
+  Value *TempAdd = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(LenWithNull,
+ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData(StringRef(), LenWithNull, NonConstStrLen, false));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(
+ArgStr,
+/*RealSize*/ nullptr, /*AlignedSize*/ nullptr, /*IsConst*/ true));
+BufSize += alignedLen;
+  } else {
+LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData(StringRef(), LenWithNull, LenWithNullAligned, false));
+  }
+} else {
+  auto AllocSize = M->getDataLayout().getTypeAllocSize(Args[i]->getType());
+  if (AllocSize <= 8)
+// We end up expanding non string arguments to 8

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-10 Thread Vikram Hegde via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG631c965483e0: [AMDGPU] Non hostcall printf support for HIP 
(authored by vikramRH).

Changed prior to commit:
  https://reviews.llvm.org/D150427?vs=529865&id=530207#toc

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150427/new/

https://reviews.llvm.org/D150427

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/TargetOptions.h
  clang/include/clang/Driver/Options.td
  clang/lib/CodeGen/CGGPUBuiltin.cpp
  clang/lib/CodeGen/CodeGenModule.cpp
  clang/lib/Driver/ToolChains/Clang.cpp
  clang/test/CodeGenHIP/default-attributes.hip
  clang/test/CodeGenHIP/printf-kind-module-flag.hip
  clang/test/CodeGenHIP/printf_nonhostcall.cpp
  clang/test/CodeGenHIP/sanitize-undefined-null.hip
  clang/test/Driver/hip-options.hip
  llvm/include/llvm/Transforms/Utils/AMDGPUEmitPrintf.h
  llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp

Index: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
===
--- llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
+++ llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp
@@ -17,6 +17,9 @@
 #include "llvm/Transforms/Utils/AMDGPUEmitPrintf.h"
 #include "llvm/ADT/SparseBitVector.h"
 #include "llvm/Analysis/ValueTracking.h"
+#include "llvm/Support/DataExtractor.h"
+#include "llvm/Support/MD5.h"
+#include "llvm/Support/MathExtras.h"
 
 using namespace llvm;
 
@@ -179,11 +182,7 @@
 
 // Scan the format string to locate all specifiers, and mark the ones that
 // specify a string, i.e, the "%s" specifier with optional '*' characters.
-static void locateCStrings(SparseBitVector<8> &BV, Value *Fmt) {
-  StringRef Str;
-  if (!getConstantStringInfo(Fmt, Str) || Str.empty())
-return;
-
+static void locateCStrings(SparseBitVector<8> &BV, StringRef Str) {
   static const char ConvSpecifiers[] = "diouxXfFeEgGaAcspn";
   size_t SpecPos = 0;
   // Skip the first argument, the format string.
@@ -207,14 +206,319 @@
   }
 }
 
-Value *llvm::emitAMDGPUPrintfCall(IRBuilder<> &Builder,
-  ArrayRef Args) {
+// helper struct to package the string related data
+struct StringData {
+  StringRef Str;
+  Value *RealSize = nullptr;
+  Value *AlignedSize = nullptr;
+  bool IsConst = true;
+
+  StringData(StringRef ST, Value *RS, Value *AS, bool IC)
+  : Str(ST), RealSize(RS), AlignedSize(AS), IsConst(IC) {}
+};
+
+// Calculates frame size required for current printf expansion and allocates
+// space on printf buffer. Printf frame includes following contents
+// [ ControlDWord , format string/Hash , Arguments (each aligned to 8 byte) ]
+static Value *callBufferedPrintfStart(
+IRBuilder<> &Builder, ArrayRef Args, Value *Fmt,
+bool isConstFmtStr, SparseBitVector<8> &SpecIsCString,
+SmallVectorImpl &StringContents, Value *&ArgSize) {
+  Module *M = Builder.GetInsertBlock()->getModule();
+  Value *NonConstStrLen = nullptr;
+  Value *LenWithNull = nullptr;
+  Value *LenWithNullAligned = nullptr;
+  Value *TempAdd = nullptr;
+
+  // First 4 bytes to be reserved for control dword
+  size_t BufSize = 4;
+  if (isConstFmtStr)
+// First 8 bytes of MD5 hash
+BufSize += 8;
+  else {
+LenWithNull = getStrlenWithNull(Builder, Fmt);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(LenWithNull,
+ConstantInt::get(LenWithNull->getType(), 7U));
+NonConstStrLen = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+StringContents.push_back(
+StringData(StringRef(), LenWithNull, NonConstStrLen, false));
+  }
+
+  for (size_t i = 1; i < Args.size(); i++) {
+if (SpecIsCString.test(i)) {
+  StringRef ArgStr;
+  if (getConstantStringInfo(Args[i], ArgStr)) {
+auto alignedLen = alignTo(ArgStr.size() + 1, 8);
+StringContents.push_back(StringData(
+ArgStr,
+/*RealSize*/ nullptr, /*AlignedSize*/ nullptr, /*IsConst*/ true));
+BufSize += alignedLen;
+  } else {
+LenWithNull = getStrlenWithNull(Builder, Args[i]);
+
+// Align the computed length to next 8 byte boundary
+TempAdd = Builder.CreateAdd(
+LenWithNull, ConstantInt::get(LenWithNull->getType(), 7U));
+LenWithNullAligned = Builder.CreateAnd(
+TempAdd, ConstantInt::get(LenWithNull->getType(), ~7U));
+
+if (NonConstStrLen) {
+  auto Val = Builder.CreateAdd(LenWithNullAligned, NonConstStrLen,
+   "cumulativeAdd");
+  NonConstStrLen = Val;
+} else
+  NonConstStrLen = LenWithNullAligned;
+
+StringContents.push_back(
+StringData(StringRef(), LenWithNull, LenWithNullAligned, false));
+  }
+} else {
+  int AllocSize = M->getDataLayout().getType