[clang] f34d068 - [SROA] Refactor rewritePartition alloca type selection process (#167771)

via cfe-commits Fri, 19 Dec 2025 11:05:02 -0800

Author: Yonah Goldberg
Date: 2025-12-19T11:04:50-08:00
New Revision: f34d06868772bca627da07897d3d334d456427a8


URL: 
https://github.com/llvm/llvm-project/commit/f34d06868772bca627da07897d3d334d456427a8
DIFF: 
https://github.com/llvm/llvm-project/commit/f34d06868772bca627da07897d3d334d456427a8.diff

LOG: [SROA] Refactor rewritePartition alloca type selection process (#167771)

This PR does two things:
1. Refactor the rewritePartition type selection process to move the
logic inside of a lambda. Previously the selection process would make
use of a mutable `SliceTy`. Each phase of the type selection would do
something like `if (!SliceTy) { // try to set sliceTy` }. But you also
have `if (!SliceTy && <condition>)` and `if (!SliceTy || <condition>)`.
I think this style makes the priority mechanism confusing. The new way I
wrote the selection process is equivalent (except for the second
contribution of this PR), and works by checking a condition, followed by
returning the selected type right away. I think it makes the priority
clearer.

2. What motivated the rewrite is that there are some cases with small
aggregate allocas that have mixed type loads and stores that SROA fails
to promote.

For example, given:
```
%alloca = alloca [2 x half]
store i32 42, ptr %alloca
%val = load float, ptr %alloca
```
Previously, SROA would:
- Find no common type between `i32` and `float`
- Use `getTypePartition` which returns `[2 x half]`
- Create a new `alloca [2 x half]`

This PR adds an additional check:

```
if (LargestIntTy &&
    DL.getTypeAllocSize(LargestIntTy).getFixedValue() >= P.size() &&
    isIntegerWideningViable(P, LargestIntTy, DL))
  return {LargestIntTy, true, nullptr};
```

which allows the alloca above to get promoted as `i32`.

The larger rewrite helps with this because this check is super specific.
I only want it to apply when there is no common type and when all the
loads and stores can be widened for promotion. I also want to apply it
on the subtype returned from getTypePartition. The larger rewrite makes
it very clear when this optimization occurs. I also don't want it to
apply when vector promotion is possible.

After this change, the example is optimized to:
```
%0 = bitcast i32 42 to float
ret void
```
The alloca is completely eliminated.

Added: 
    llvm/test/Transforms/SROA/prefer-integer-partition.ll

Modified: 
    clang/test/CodeGen/arm-bf16-convert-intrinsics.c
    llvm/lib/Transforms/Scalar/SROA.cpp

Removed: 
    


################################################################################
diff  --git a/clang/test/CodeGen/arm-bf16-convert-intrinsics.c 
b/clang/test/CodeGen/arm-bf16-convert-intrinsics.c
index 65a23dc0325c8..8a1ef2441b39d 100644
--- a/clang/test/CodeGen/arm-bf16-convert-intrinsics.c
+++ b/clang/test/CodeGen/arm-bf16-convert-intrinsics.c
@@ -196,35 +196,33 @@ bfloat16x4_t test_vcvt_bf16_f32(float32x4_t a) {
 //
 // CHECK-A32-HARDFP-LABEL: @test_vcvtq_low_bf16_f32(
 // CHECK-A32-HARDFP-NEXT:  entry:
-// CHECK-A32-HARDFP-NEXT:    [[TMP0:%.*]] = bitcast i64 0 to <4 x bfloat>
-// CHECK-A32-HARDFP-NEXT:    [[TMP1:%.*]] = bitcast <4 x float> [[A:%.*]] to 
<4 x i32>
-// CHECK-A32-HARDFP-NEXT:    [[TMP2:%.*]] = bitcast <4 x i32> [[TMP1]] to <16 
x i8>
-// CHECK-A32-HARDFP-NEXT:    [[VCVTFP2BF_I:%.*]] = bitcast <16 x i8> [[TMP2]] 
to <4 x float>
+// CHECK-A32-HARDFP-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[A:%.*]] to 
<4 x i32>
+// CHECK-A32-HARDFP-NEXT:    [[TMP1:%.*]] = bitcast <4 x i32> [[TMP0]] to <16 
x i8>
+// CHECK-A32-HARDFP-NEXT:    [[VCVTFP2BF_I:%.*]] = bitcast <16 x i8> [[TMP1]] 
to <4 x float>
 // CHECK-A32-HARDFP-NEXT:    [[VCVTFP2BF1_I:%.*]] = call <4 x bfloat> 
@llvm.arm.neon.vcvtfp2bf.v4bf16(<4 x float> [[VCVTFP2BF_I]])
-// CHECK-A32-HARDFP-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <4 x bfloat> 
[[TMP0]], <4 x bfloat> [[VCVTFP2BF1_I]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, 
i32 4, i32 5, i32 6, i32 7>
+// CHECK-A32-HARDFP-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <4 x bfloat> 
zeroinitializer, <4 x bfloat> [[VCVTFP2BF1_I]], <8 x i32> <i32 0, i32 1, i32 2, 
i32 3, i32 4, i32 5, i32 6, i32 7>
 // CHECK-A32-HARDFP-NEXT:    ret <8 x bfloat> [[SHUFFLE_I]]
 //
 // CHECK-A32-SOFTFP-LABEL: @test_vcvtq_low_bf16_f32(
 // CHECK-A32-SOFTFP-NEXT:  entry:
-// CHECK-A32-SOFTFP-NEXT:    [[TMP0:%.*]] = bitcast i64 0 to <4 x bfloat>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP1:%.*]] = bitcast <4 x float> [[A:%.*]] to 
<4 x i32>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP2:%.*]] = bitcast <4 x i32> [[TMP1]] to <16 
x i8>
-// CHECK-A32-SOFTFP-NEXT:    [[VCVTFP2BF_I:%.*]] = bitcast <16 x i8> [[TMP2]] 
to <4 x float>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP0:%.*]] = bitcast <4 x float> [[A:%.*]] to 
<4 x i32>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP1:%.*]] = bitcast <4 x i32> [[TMP0]] to <16 
x i8>
+// CHECK-A32-SOFTFP-NEXT:    [[VCVTFP2BF_I:%.*]] = bitcast <16 x i8> [[TMP1]] 
to <4 x float>
 // CHECK-A32-SOFTFP-NEXT:    [[VCVTFP2BF1_I:%.*]] = call <4 x i16> 
@llvm.arm.neon.vcvtfp2bf.v4i16(<4 x float> [[VCVTFP2BF_I]])
-// CHECK-A32-SOFTFP-NEXT:    [[TMP3:%.*]] = bitcast <4 x i16> [[VCVTFP2BF1_I]] 
to <4 x bfloat>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP4:%.*]] = bitcast <4 x bfloat> [[TMP3]] to 
<2 x i32>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP5:%.*]] = bitcast <2 x i32> [[TMP4]] to <4 x 
bfloat>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP6:%.*]] = bitcast <4 x bfloat> [[TMP0]] to 
<2 x i32>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP7:%.*]] = bitcast <4 x bfloat> [[TMP5]] to 
<2 x i32>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP2:%.*]] = bitcast <4 x i16> [[VCVTFP2BF1_I]] 
to <4 x bfloat>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP3:%.*]] = bitcast <4 x bfloat> [[TMP2]] to 
<2 x i32>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP4:%.*]] = bitcast <2 x i32> [[TMP3]] to <4 x 
bfloat>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP5:%.*]] = bitcast <4 x bfloat> 
zeroinitializer to <2 x i32>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP6:%.*]] = bitcast <4 x bfloat> [[TMP4]] to 
<2 x i32>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP7:%.*]] = bitcast <2 x i32> [[TMP5]] to <4 x 
bfloat>
 // CHECK-A32-SOFTFP-NEXT:    [[TMP8:%.*]] = bitcast <2 x i32> [[TMP6]] to <4 x 
bfloat>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP9:%.*]] = bitcast <2 x i32> [[TMP7]] to <4 x 
bfloat>
-// CHECK-A32-SOFTFP-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <4 x bfloat> 
[[TMP8]], <4 x bfloat> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, 
i32 5, i32 6, i32 7>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP10:%.*]] = bitcast <8 x bfloat> 
[[SHUFFLE_I]] to <4 x i32>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP11:%.*]] = bitcast <4 x i32> [[TMP10]] to <8 
x bfloat>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP12:%.*]] = bitcast <8 x bfloat> [[TMP11]] to 
<4 x i32>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP13:%.*]] = bitcast <4 x i32> [[TMP12]] to <8 
x bfloat>
-// CHECK-A32-SOFTFP-NEXT:    [[TMP14:%.*]] = bitcast <8 x bfloat> [[TMP13]] to 
<4 x i32>
-// CHECK-A32-SOFTFP-NEXT:    ret <4 x i32> [[TMP14]]
+// CHECK-A32-SOFTFP-NEXT:    [[SHUFFLE_I:%.*]] = shufflevector <4 x bfloat> 
[[TMP7]], <4 x bfloat> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, 
i32 5, i32 6, i32 7>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP9:%.*]] = bitcast <8 x bfloat> [[SHUFFLE_I]] 
to <4 x i32>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP10:%.*]] = bitcast <4 x i32> [[TMP9]] to <8 
x bfloat>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP11:%.*]] = bitcast <8 x bfloat> [[TMP10]] to 
<4 x i32>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP12:%.*]] = bitcast <4 x i32> [[TMP11]] to <8 
x bfloat>
+// CHECK-A32-SOFTFP-NEXT:    [[TMP13:%.*]] = bitcast <8 x bfloat> [[TMP12]] to 
<4 x i32>
+// CHECK-A32-SOFTFP-NEXT:    ret <4 x i32> [[TMP13]]
 //
 bfloat16x8_t test_vcvtq_low_bf16_f32(float32x4_t a) {
   return vcvtq_low_bf16_f32(a);
@@ -316,4 +314,3 @@ bfloat16_t test_vcvth_bf16_f32(float32_t a) {
 float32_t test_vcvtah_f32_bf16(bfloat16_t a) {
   return vcvtah_f32_bf16(a);
 }
-

diff  --git a/llvm/lib/Transforms/Scalar/SROA.cpp 
b/llvm/lib/Transforms/Scalar/SROA.cpp
index 1102699aa04e9..f2a85cc7af441 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -5222,6 +5222,96 @@ bool SROA::presplitLoadsAndStores(AllocaInst &AI, 
AllocaSlices &AS) {
   return true;
 }
 
+/// Select a partition type for an alloca partition.
+///
+/// Try to compute a friendly type for this partition of the alloca. This
+/// won't always succeed, in which case we fall back to a legal integer type
+/// or an i8 array of an appropriate size.
+///
+/// \returns A tuple with the following elements:
+///   - PartitionType: The computed type for this partition.
+///   - IsIntegerWideningViable: True if integer widening promotion is used.
+///   - VectorType: The vector type if vector promotion is used, otherwise
+///     nullptr.
+static std::tuple<Type *, bool, VectorType *>
+selectPartitionType(Partition &P, const DataLayout &DL, AllocaInst &AI,
+                    LLVMContext &C) {
+  // First check if the partition is viable for vector promotion.
+  //
+  // We prefer vector promotion over integer widening promotion when:
+  // - The vector element type is a floating-point type.
+  // - All the loads/stores to the alloca are vector loads/stores to the
+  //   entire alloca or load/store a single element of the vector.
+  //
+  // Otherwise when there is an integer vector with mixed type loads/stores we
+  // prefer integer widening promotion because it's more likely the user is
+  // doing bitwise arithmetic and we generate better code.
+  VectorType *VecTy =
+      isVectorPromotionViable(P, DL, AI.getFunction()->getVScaleValue());
+  // If the vector element type is a floating-point type, we prefer vector
+  // promotion. If the vector has one element, let the below code select
+  // whether we promote with the vector or scalar.
+  if (VecTy && VecTy->getElementType()->isFloatingPointTy() &&
+      VecTy->getElementCount().getFixedValue() > 1)
+    return {VecTy, false, VecTy};
+
+  // Check if there is a common type that all slices of the partition use that
+  // spans the partition.
+  auto [CommonUseTy, LargestIntTy] =
+      findCommonType(P.begin(), P.end(), P.endOffset());
+  if (CommonUseTy) {
+    TypeSize CommonUseSize = DL.getTypeAllocSize(CommonUseTy);
+    if (CommonUseSize.isFixed() && CommonUseSize.getFixedValue() >= P.size()) {
+      // We prefer vector promotion here because if vector promotion is viable
+      // and there is a common type used, then it implies the second listed
+      // condition for preferring vector promotion is true.
+      if (VecTy)
+        return {VecTy, false, VecTy};
+      return {CommonUseTy, isIntegerWideningViable(P, CommonUseTy, DL),
+              nullptr};
+    }
+  }
+
+  // Can we find an appropriate subtype in the original allocated
+  // type?
+  if (Type *TypePartitionTy = getTypePartition(DL, AI.getAllocatedType(),
+                                               P.beginOffset(), P.size())) {
+    // If the partition is an integer array that can be spanned by a legal
+    // integer type, prefer to represent it as a legal integer type because
+    // it's more likely to be promotable.
+    if (TypePartitionTy->isArrayTy() &&
+        TypePartitionTy->getArrayElementType()->isIntegerTy() &&
+        DL.isLegalInteger(P.size() * 8))
+      TypePartitionTy = Type::getIntNTy(C, P.size() * 8);
+    // There was no common type used, so we prefer integer widening promotion.
+    if (isIntegerWideningViable(P, TypePartitionTy, DL))
+      return {TypePartitionTy, true, nullptr};
+    if (VecTy)
+      return {VecTy, false, VecTy};
+    // If we couldn't promote with TypePartitionTy, try with the largest
+    // integer type used.
+    if (LargestIntTy &&
+        DL.getTypeAllocSize(LargestIntTy).getFixedValue() >= P.size() &&
+        isIntegerWideningViable(P, LargestIntTy, DL))
+      return {LargestIntTy, true, nullptr};
+
+    // Fallback to TypePartitionTy and we probably won't promote.
+    return {TypePartitionTy, false, nullptr};
+  }
+
+  // Select the largest integer type used if it spans the partition.
+  if (LargestIntTy &&
+      DL.getTypeAllocSize(LargestIntTy).getFixedValue() >= P.size())
+    return {LargestIntTy, false, nullptr};
+
+  // Select a legal integer type if it spans the partition.
+  if (DL.isLegalInteger(P.size() * 8))
+    return {Type::getIntNTy(C, P.size() * 8), false, nullptr};
+
+  // Fallback to an i8 array.
+  return {ArrayType::get(Type::getInt8Ty(C), P.size()), false, nullptr};
+}
+
 /// Rewrite an alloca partition's users.
 ///
 /// This routine drives both of the rewriting goals of the SROA pass. It tries
@@ -5234,47 +5324,10 @@ bool SROA::presplitLoadsAndStores(AllocaInst &AI, 
AllocaSlices &AS) {
 /// promoted.
 AllocaInst *SROA::rewritePartition(AllocaInst &AI, AllocaSlices &AS,
                                    Partition &P) {
-  // Try to compute a friendly type for this partition of the alloca. This
-  // won't always succeed, in which case we fall back to a legal integer type
-  // or an i8 array of an appropriate size.
-  Type *SliceTy = nullptr;
   const DataLayout &DL = AI.getDataLayout();
-  unsigned VScale = AI.getFunction()->getVScaleValue();
-
-  std::pair<Type *, IntegerType *> CommonUseTy =
-      findCommonType(P.begin(), P.end(), P.endOffset());
-  // Do all uses operate on the same type?
-  if (CommonUseTy.first) {
-    TypeSize CommonUseSize = DL.getTypeAllocSize(CommonUseTy.first);
-    if (CommonUseSize.isFixed() && CommonUseSize.getFixedValue() >= P.size())
-      SliceTy = CommonUseTy.first;
-  }
-  // If not, can we find an appropriate subtype in the original allocated type?
-  if (!SliceTy)
-    if (Type *TypePartitionTy = getTypePartition(DL, AI.getAllocatedType(),
-                                                 P.beginOffset(), P.size()))
-      SliceTy = TypePartitionTy;
-
-  // If still not, can we use the largest bitwidth integer type used?
-  if (!SliceTy && CommonUseTy.second)
-    if (DL.getTypeAllocSize(CommonUseTy.second).getFixedValue() >= P.size())
-      SliceTy = CommonUseTy.second;
-  if ((!SliceTy || (SliceTy->isArrayTy() &&
-                    SliceTy->getArrayElementType()->isIntegerTy())) &&
-      DL.isLegalInteger(P.size() * 8)) {
-    SliceTy = Type::getIntNTy(*C, P.size() * 8);
-  }
-
-  if (!SliceTy)
-    SliceTy = ArrayType::get(Type::getInt8Ty(*C), P.size());
-  assert(DL.getTypeAllocSize(SliceTy).getFixedValue() >= P.size());
-
-  bool IsIntegerPromotable = isIntegerWideningViable(P, SliceTy, DL);
-
-  VectorType *VecTy =
-      IsIntegerPromotable ? nullptr : isVectorPromotionViable(P, DL, VScale);
-  if (VecTy)
-    SliceTy = VecTy;
+  // Select the type for the new alloca that spans the partition.
+  auto [PartitionTy, IsIntegerWideningViable, VecTy] =
+      selectPartitionType(P, DL, AI, *C);
 
   // Check for the case where we're going to rewrite to a new alloca of the
   // exact same type as the original, and with the same access offsets. In that
@@ -5283,7 +5336,7 @@ AllocaInst *SROA::rewritePartition(AllocaInst &AI, 
AllocaSlices &AS,
   // P.beginOffset() can be non-zero even with the same type in a case with
   // out-of-bounds access (e.g. @PR35657 function in SROA/basictest.ll).
   AllocaInst *NewAI;
-  if (SliceTy == AI.getAllocatedType() && P.beginOffset() == 0) {
+  if (PartitionTy == AI.getAllocatedType() && P.beginOffset() == 0) {
     NewAI = &AI;
     // FIXME: We should be able to bail at this point with "nothing changed".
     // FIXME: We might want to defer PHI speculation until after here.
@@ -5293,10 +5346,10 @@ AllocaInst *SROA::rewritePartition(AllocaInst &AI, 
AllocaSlices &AS,
     const Align Alignment = commonAlignment(AI.getAlign(), P.beginOffset());
     // If we will get at least this much alignment from the type alone, leave
     // the alloca's alignment unconstrained.
-    const bool IsUnconstrained = Alignment <= DL.getABITypeAlign(SliceTy);
+    const bool IsUnconstrained = Alignment <= DL.getABITypeAlign(PartitionTy);
     NewAI = new AllocaInst(
-        SliceTy, AI.getAddressSpace(), nullptr,
-        IsUnconstrained ? DL.getPrefTypeAlign(SliceTy) : Alignment,
+        PartitionTy, AI.getAddressSpace(), nullptr,
+        IsUnconstrained ? DL.getPrefTypeAlign(PartitionTy) : Alignment,
         AI.getName() + ".sroa." + Twine(P.begin() - AS.begin()),
         AI.getIterator());
     // Copy the old AI debug location over to the new one.
@@ -5316,7 +5369,7 @@ AllocaInst *SROA::rewritePartition(AllocaInst &AI, 
AllocaSlices &AS,
   SmallSetVector<SelectInst *, 8> SelectUsers;
 
   AllocaSliceRewriter Rewriter(DL, AS, *this, AI, *NewAI, P.beginOffset(),
-                               P.endOffset(), IsIntegerPromotable, VecTy,
+                               P.endOffset(), IsIntegerWideningViable, VecTy,
                                PHIUsers, SelectUsers);
   bool Promotable = true;
   // Check whether we can have tree-structured merge.

diff  --git a/llvm/test/Transforms/SROA/prefer-integer-partition.ll 
b/llvm/test/Transforms/SROA/prefer-integer-partition.ll
new file mode 100644
index 0000000000000..f5d261d8afc07
--- /dev/null
+++ b/llvm/test/Transforms/SROA/prefer-integer-partition.ll
@@ -0,0 +1,73 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 6
+; RUN: opt %s -passes=sroa -S | FileCheck %s
+
+%"struct.pbrt::RaySamples" = type { %struct.anon.45, %struct.anon.46, i8, 
%struct.anon.47 }
+%struct.anon.45 = type { %"class.pbrt::Point2", float }
+%"class.pbrt::Point2" = type { %"class.pbrt::Tuple2" }
+%"class.pbrt::Tuple2" = type { float, float }
+%struct.anon.46 = type { float, float, %"class.pbrt::Point2" }
+%struct.anon.47 = type { float, %"class.pbrt::Point2" }
+
+define <2 x float> @subsurface_test() local_unnamed_addr {
+; CHECK-LABEL: define <2 x float> @subsurface_test() local_unnamed_addr {
+; CHECK-NEXT:    [[TMP1:%.*]] = load float, ptr inttoptr (i64 12 to ptr), 
align 4
+; CHECK-NEXT:    [[TMP2:%.*]] = fptosi float [[TMP1]] to i32
+; CHECK-NEXT:    [[TMP3:%.*]] = trunc i32 [[TMP2]] to i1
+; CHECK-NEXT:    br i1 [[TMP3]], label %[[BB4:.*]], label 
%[[_ZNK4PBRT3SOAINS_10RAYSAMPLESEEIXEI_EXIT:.*]]
+; CHECK:       [[BB4]]:
+; CHECK-NEXT:    [[TMP5:%.*]] = load volatile { <2 x float>, <2 x float> }, 
ptr null, align 8
+; CHECK-NEXT:    [[TMP6:%.*]] = extractvalue { <2 x float>, <2 x float> } 
[[TMP5]], 0
+; CHECK-NEXT:    [[TMP7:%.*]] = extractvalue { <2 x float>, <2 x float> } 
[[TMP5]], 1
+; CHECK-NEXT:    [[BC_I:%.*]] = bitcast <2 x float> [[TMP6]] to <2 x i32>
+; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[BC_I]], i64 1
+; CHECK-NEXT:    [[BC2_I:%.*]] = bitcast <2 x float> [[TMP7]] to <2 x i32>
+; CHECK-NEXT:    [[TMP9:%.*]] = extractelement <2 x i32> [[BC2_I]], i64 0
+; CHECK-NEXT:    [[TMP12:%.*]] = bitcast i32 [[TMP8]] to float
+; CHECK-NEXT:    [[DOTSROA_1_36_VEC_INSERT:%.*]] = insertelement <2 x float> 
zeroinitializer, float [[TMP12]], i32 0
+; CHECK-NEXT:    [[TMP11:%.*]] = bitcast i32 [[TMP9]] to float
+; CHECK-NEXT:    [[DOTSROA_1_40_VEC_INSERT:%.*]] = insertelement <2 x float> 
[[DOTSROA_1_36_VEC_INSERT]], float [[TMP11]], i32 1
+; CHECK-NEXT:    br label %[[_ZNK4PBRT3SOAINS_10RAYSAMPLESEEIXEI_EXIT]]
+; CHECK:       [[_ZNK4PBRT3SOAINS_10RAYSAMPLESEEIXEI_EXIT]]:
+; CHECK-NEXT:    [[TMP10:%.*]] = phi <2 x float> [ 
[[DOTSROA_1_40_VEC_INSERT]], %[[BB4]] ], [ zeroinitializer, [[TMP0:%.*]] ]
+; CHECK-NEXT:    ret <2 x float> [[TMP10]]
+;
+  %1 = alloca %"struct.pbrt::RaySamples", align 4
+  %2 = getelementptr i8, ptr %1, i64 36
+  store i64 0, ptr %2, align 4
+  %3 = load float, ptr inttoptr (i64 12 to ptr), align 4
+  %4 = fptosi float %3 to i32
+  %5 = trunc i32 %4 to i1
+  br i1 %5, label %6, label %_ZNK4pbrt3SOAINS_10RaySamplesEEixEi.exit
+
+6:                                                ; preds = %0
+  %7 = load volatile { <2 x float>, <2 x float> }, ptr null, align 8
+  %8 = extractvalue { <2 x float>, <2 x float> } %7, 0
+  %9 = extractvalue { <2 x float>, <2 x float> } %7, 1
+  store float 0.000000e+00, ptr %1, align 4
+  %bc.i = bitcast <2 x float> %8 to <2 x i32>
+  %10 = extractelement <2 x i32> %bc.i, i64 1
+  %bc2.i = bitcast <2 x float> %9 to <2 x i32>
+  %11 = extractelement <2 x i32> %bc2.i, i64 0
+  store i32 %10, ptr %2, align 4
+  %.sroa_idx1.i = getelementptr i8, ptr %1, i64 40
+  store i32 %11, ptr %.sroa_idx1.i, align 4
+  br label %_ZNK4pbrt3SOAINS_10RaySamplesEEixEi.exit
+
+_ZNK4pbrt3SOAINS_10RaySamplesEEixEi.exit:         ; preds = %0, %6
+  %12 = getelementptr inbounds nuw i8, ptr %1, i64 36
+  %.sroa.01.0.copyload = load <2 x float>, ptr %12, align 4
+  ret <2 x float> %.sroa.01.0.copyload
+}
+
+define void @test_mixed_types() {
+; CHECK-LABEL: @test_mixed_types(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32 42 to float
+; CHECK-NEXT:    ret void
+;
+entry:
+  %alloca = alloca [2 x half]
+  store i32 42, ptr %alloca
+  %val = load float, ptr %alloca
+  ret void
+}


        
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] f34d068 - [SROA] Refactor rewritePartition alloca type selection process (#167771)

Reply via email to