[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-23 Thread Han-Chung Wang via llvm-branch-commits

https://github.com/hanhanW edited 
https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-23 Thread Han-Chung Wang via llvm-branch-commits


@@ -150,9 +150,13 @@ def Linalg_PackOp : Linalg_RelayoutOp<"pack", [
 
 `padding_value` specifies a padding value at the boundary on non-perfectly
 divisible dimensions. Padding is optional:
-- If absent, it is UB if the tile does not perfectly divide the dimension.
+- If absent, it assumes the tile perfectly divides the dimension.

hanhanW wrote:

I see what you meant. Sorry, somehow I thought that you're talking about the 
other case. My brain state must be messy when I replied the comments. Yes, I 
found an invalid test here:

https://github.com/llvm/llvm-project/blob/45d99c26c3513945a454e90b69d48257886f8284/mlir/test/Dialect/Linalg/invalid.mlir#L1692-L1696

https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-23 Thread Han-Chung Wang via llvm-branch-commits

hanhanW wrote:

> think you should drop @2548809 from this PR, so we could handle #150127 
> separately :) Or target that branch from this one if you're planning to get 
> that one in first. Otherwise LGTM.

I need to land the other one first. Otherwise, the CI is not happy. I recreate 
the other PR because the branch needs to live in upstream and it has to 
associate to a PR: https://github.com/llvm/llvm-project/pull/150272

> It is possible to create branches in llvm/llvm-project/ that start with 
> users//, however this is intended to be able to support “stacked” 
> pull-request. Do not create any branches in the llvm/llvm-project repository 
> otherwise, please use a fork (see above). User branches that aren’t 
> associated with a pull-request will be deleted.

https://llvm.org/docs/GitHub.html#branches

https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-23 Thread Han-Chung Wang via llvm-branch-commits

https://github.com/hanhanW edited 
https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-24 Thread Han-Chung Wang via llvm-branch-commits

https://github.com/hanhanW closed 
https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Enable scalable vectorization of linalg.unpack (PR #149293)

2025-07-31 Thread Han-Chung Wang via llvm-branch-commits


@@ -1876,22 +1872,35 @@ static VectorType getCollapsedVecType(VectorType type,
   return VectorType::get(newShape, type.getElementType(), newScalableFlags);
 }
 
-/// Vectorize a `linalg::UnPackOp` to these 4 Ops:
-///   Vector::TransferReadOp - Reads a vector from the source tensor
-///   vector::TransposeOp - Transpose the Source tensor
-///   ShapeCastOp - Reshape the data based on the target.
-///   vector::TransferWriteOp. - Write the result vector back to the 
destination
-///   tensor.
-///   If the vector sizes are not provided:
-/// Vectorize `linalg.unpack %src into %dest` as:
-///   // Reads a vector from the source tensor
-///   %read = vector.transfer_read  %src
-///   // Transpose %read as specified in `outer_dims_perm` attribute
-///   %tr = vector.transpose %read
-///   // Reshape the data based on the target
-///   %sc = vector.shape_cast %tr
-///   // Write the result vector to the destination tensor.
-///   vector.transfer_write %sc into %dest
+/// Vectorize `linalg.unpack` into:
+///   * xfer_read -> vector.transpose -> vector.shape_cast -> xfer_write
+///
+/// The input-vector-sizes specify both the read and the write vector
+/// sizes and are passed as one array covering both operations, i.e.:
+///
+///  input-vector-sizes = [1, 1, 8, [8],  8, [8]]
+///\ /\/
+///read-sizes   write-sizes
+///
+/// (for brefity, in the diagram,
+///* input-vector-sizes = `inputVectorSizes` + `inputScalableDims`
+/// )
+///
+/// If the vector sizes are not provided:
+///  * the vector sizes are determined by the operands,
+///  * the inBounds attribute is used instead of masking.
+///
+/// EXAMPLE (no vector sizes):
+/// ```
+///   %unpack = linalg.unpack  %src
+///inner_dims_pos = [0, 1]
+///inner_tiles = [8, 8]
+///into %dest : tensor<1x1x8x8xf32> -> tensor<8x8xf32>
+/// ```
+/// is vectorized as:
+/// ```
+///   vector.transfer_write %sc into %dest : vector<8x8xf32>, tensor<8x8xf32>
+/// ```

hanhanW wrote:

You deleted few ops in 
https://github.com/llvm/llvm-project/pull/149293/commits/b073854c4e681f2a8c89528ae4108f8bf7c6193f.
 I think we should add the xfer_read, transpose and shape_cast back.

https://github.com/llvm/llvm-project/pull/149293
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Enable scalable vectorization of linalg.unpack (PR #149293)

2025-07-31 Thread Han-Chung Wang via llvm-branch-commits


@@ -1871,115 +1872,98 @@ static VectorType getCollapsedVecType(VectorType type,
   return VectorType::get(newShape, type.getElementType(), newScalableFlags);
 }
 
-/// Vectorize a `linalg::UnPackOp` to these 4 Ops:
-///   Vector::TransferReadOp - Reads a vector from the source tensor
-///   vector::TransposeOp - Transpose the Source tensor
-///   ShapeCastOp - Reshape the data based on the target.
-///   vector::TransferWriteOp. - Write the result vector back to the 
destination
-///   tensor.
-///   If the vector sizes are not provided:
-///   * the vector sizes are determined by the input operand and attributes,
-///   * update the inBounds attribute instead of masking.
+/// Vectorize `linalg.unpack` as:
+///   * xfer_read -> vector.transpose -> vector.shape_cast -> xfer_write
+///
+/// The input-vector-sizes specify the read vector sizes (i.e. the vector sizes
+/// for the xfer_read operation). This is sufficient to infer the other vector
+/// sizes required here.
+///
+/// If the vector sizes are not provided:
+///  * the vector sizes are determined by the operands,
+///  * the inBounds attribute is used instead of masking.
+///
+/// EXAMPLE (no vector sizes):
+/// ```
+///   %unpack = linalg.unpack  %src
+///inner_dims_pos = [0, 1]
+///inner_tiles = [8, 8]
+///into %dest : tensor<1x1x8x8xf32> -> tensor<8x8xf32>
+/// ```
+/// is vectorized as:
+/// ```
+///   vector.transfer_write %sc into %dest : vector<8x8xf32>, tensor<8x8xf32>
+/// ```
 static LogicalResult
 vectorizeAsTensorUnpackOp(RewriterBase &rewriter, linalg::UnPackOp unpackOp,
   ArrayRef inputVectorSizes,
+  ArrayRef inputScalableVecDims,
   SmallVectorImpl &newResults) {
+  if (!inputVectorSizes.empty()) {
+assert(inputVectorSizes.size() == unpackOp.getSourceRank() &&
+   "Invalid number of input vector sizes!");
+assert(inputVectorSizes.size() == inputScalableVecDims.size() &&
+   "Incompatible number of vector sizes and vector scalable flags!");
+  }
 
   // TODO: Introduce a parent class that will handle the insertion point 
update.
   OpBuilder::InsertionGuard g(rewriter);
   rewriter.setInsertionPoint(unpackOp);
 
   RankedTensorType unpackTensorType = unpackOp.getSourceType();
 
-  ArrayRef innerDimPos = unpackOp.getInnerDimsPos();
-  ArrayRef innerTiles = unpackOp.getStaticInnerTiles();
   ArrayRef sourceShape = unpackTensorType.getShape();
+  ArrayRef destShape = unpackOp.getDestType().getShape();
   bool useInBoundsInsteadOfMasking = false;
-  ArrayRef outerDimsPerm = unpackOp.getOuterDimsPerm();
-
-  auto destSize = unpackOp.getDestRank();
-
-  if (!inputVectorSizes.empty())
-assert(inputVectorSizes.size() == destSize &&
-   "Incorrect number of input vector sizes");
-
-  // vectorSizes is the shape of the vector that will be used to do final
-  // write on the destination tensor. It is set like this: Let's say the
-  // source tensor is rank 'M' and the dest tensor rank 'N', where N <= M.
-  // Thus:
-  // 1. vectorSizes = sourceShape.take_front(N)
-  // 2. if outer_dims_perms is present: do that permutation on vectorSizes.
-  // 3. multiply all the locations in vectorSize pointed by innerDimPos by the
-  //innerTiles attribute value.
-  SmallVector vectorSizes(inputVectorSizes);
-  if (vectorSizes.empty()) {
-llvm::append_range(vectorSizes, sourceShape.take_front(destSize));
-if (!outerDimsPerm.empty())
-  applyPermutationToVector(vectorSizes, outerDimsPerm);
-for (auto [i, pos] : llvm::enumerate(innerDimPos))
-  vectorSizes[pos] *= innerTiles[i];
 
-useInBoundsInsteadOfMasking = true;
-  }
+  Location loc = unpackOp->getLoc();
 
-  // readVectorSizes is the size of tensor used to read and apply mask. It is
-  // set like this: Let's say the vectorSize (VS) array is size 'N' and
-  // the sourceShape(SS) is 'M' where M >= N and InnerTileSizes (IT) of
-  // size M-N
-  // Thus:
-  // - initially: readVectorSizes = vectorInputSizes
-  // - Divide all the readMaskShape locations pointed by innerDimPos
-  //   by the innerTileSize attribute value.
-  // - if outer_dims_perms is present: do that permutation on readVectorSizes.
-  // - Append the remaining shape from SS
-  // E.g. let's say let's say unpackTensorType.getShape() = <8x8x32x16>
-  // inner Dim Pos = [0, 1] and Inner Tiles = [32, 16], vector_sizes are [512,
-  // 128] and outer_dims_perm is [1, 0] then read shape is:
-  //   ReadVectorSizes(initial): [512, 128]
-  //   Final Value(after innerDim Adjustment): [512/32, 128/16]
-  //   = [16, 8]
-  //   After applying outer_dims_perm: [8, 16]
-  //   After appending the rest of the sourceShape: [8, 16, 32, 16]
-
-  SmallVector readVectorSizes(vectorSizes.begin(), vectorSizes.end());
-
-  for (auto [index, size] : enumerate(innerTiles)) {
-readVectorSizes[innerDimPos[index]] =
-llvm::divideCeil(readVectorSizes[i

[llvm-branch-commits] [mlir] [mlir][linalg] Enable scalable vectorization of linalg.unpack (PR #149293)

2025-07-31 Thread Han-Chung Wang via llvm-branch-commits


@@ -1871,115 +1872,98 @@ static VectorType getCollapsedVecType(VectorType type,
   return VectorType::get(newShape, type.getElementType(), newScalableFlags);
 }
 
-/// Vectorize a `linalg::UnPackOp` to these 4 Ops:
-///   Vector::TransferReadOp - Reads a vector from the source tensor
-///   vector::TransposeOp - Transpose the Source tensor
-///   ShapeCastOp - Reshape the data based on the target.
-///   vector::TransferWriteOp. - Write the result vector back to the 
destination
-///   tensor.
-///   If the vector sizes are not provided:
-///   * the vector sizes are determined by the input operand and attributes,
-///   * update the inBounds attribute instead of masking.
+/// Vectorize `linalg.unpack` as:
+///   * xfer_read -> vector.transpose -> vector.shape_cast -> xfer_write
+///
+/// The input-vector-sizes specify the read vector sizes (i.e. the vector sizes
+/// for the xfer_read operation). This is sufficient to infer the other vector
+/// sizes required here.
+///
+/// If the vector sizes are not provided:
+///  * the vector sizes are determined by the operands,
+///  * the inBounds attribute is used instead of masking.
+///
+/// EXAMPLE (no vector sizes):
+/// ```
+///   %unpack = linalg.unpack  %src
+///inner_dims_pos = [0, 1]
+///inner_tiles = [8, 8]
+///into %dest : tensor<1x1x8x8xf32> -> tensor<8x8xf32>
+/// ```
+/// is vectorized as:
+/// ```
+///   vector.transfer_write %sc into %dest : vector<8x8xf32>, tensor<8x8xf32>
+/// ```
 static LogicalResult
 vectorizeAsTensorUnpackOp(RewriterBase &rewriter, linalg::UnPackOp unpackOp,
   ArrayRef inputVectorSizes,
+  ArrayRef inputScalableVecDims,
   SmallVectorImpl &newResults) {
+  if (!inputVectorSizes.empty()) {
+assert(inputVectorSizes.size() == unpackOp.getSourceRank() &&
+   "Invalid number of input vector sizes!");
+assert(inputVectorSizes.size() == inputScalableVecDims.size() &&
+   "Incompatible number of vector sizes and vector scalable flags!");
+  }
 
   // TODO: Introduce a parent class that will handle the insertion point 
update.
   OpBuilder::InsertionGuard g(rewriter);
   rewriter.setInsertionPoint(unpackOp);
 
   RankedTensorType unpackTensorType = unpackOp.getSourceType();
 
-  ArrayRef innerDimPos = unpackOp.getInnerDimsPos();
-  ArrayRef innerTiles = unpackOp.getStaticInnerTiles();
   ArrayRef sourceShape = unpackTensorType.getShape();
+  ArrayRef destShape = unpackOp.getDestType().getShape();
   bool useInBoundsInsteadOfMasking = false;
-  ArrayRef outerDimsPerm = unpackOp.getOuterDimsPerm();
-
-  auto destSize = unpackOp.getDestRank();
-
-  if (!inputVectorSizes.empty())
-assert(inputVectorSizes.size() == destSize &&
-   "Incorrect number of input vector sizes");
-
-  // vectorSizes is the shape of the vector that will be used to do final
-  // write on the destination tensor. It is set like this: Let's say the
-  // source tensor is rank 'M' and the dest tensor rank 'N', where N <= M.
-  // Thus:
-  // 1. vectorSizes = sourceShape.take_front(N)
-  // 2. if outer_dims_perms is present: do that permutation on vectorSizes.
-  // 3. multiply all the locations in vectorSize pointed by innerDimPos by the
-  //innerTiles attribute value.
-  SmallVector vectorSizes(inputVectorSizes);
-  if (vectorSizes.empty()) {
-llvm::append_range(vectorSizes, sourceShape.take_front(destSize));
-if (!outerDimsPerm.empty())
-  applyPermutationToVector(vectorSizes, outerDimsPerm);
-for (auto [i, pos] : llvm::enumerate(innerDimPos))
-  vectorSizes[pos] *= innerTiles[i];
 
-useInBoundsInsteadOfMasking = true;
-  }
+  Location loc = unpackOp->getLoc();
 
-  // readVectorSizes is the size of tensor used to read and apply mask. It is
-  // set like this: Let's say the vectorSize (VS) array is size 'N' and
-  // the sourceShape(SS) is 'M' where M >= N and InnerTileSizes (IT) of
-  // size M-N
-  // Thus:
-  // - initially: readVectorSizes = vectorInputSizes
-  // - Divide all the readMaskShape locations pointed by innerDimPos
-  //   by the innerTileSize attribute value.
-  // - if outer_dims_perms is present: do that permutation on readVectorSizes.
-  // - Append the remaining shape from SS
-  // E.g. let's say let's say unpackTensorType.getShape() = <8x8x32x16>
-  // inner Dim Pos = [0, 1] and Inner Tiles = [32, 16], vector_sizes are [512,
-  // 128] and outer_dims_perm is [1, 0] then read shape is:
-  //   ReadVectorSizes(initial): [512, 128]
-  //   Final Value(after innerDim Adjustment): [512/32, 128/16]
-  //   = [16, 8]
-  //   After applying outer_dims_perm: [8, 16]
-  //   After appending the rest of the sourceShape: [8, 16, 32, 16]
-
-  SmallVector readVectorSizes(vectorSizes.begin(), vectorSizes.end());
-
-  for (auto [index, size] : enumerate(innerTiles)) {
-readVectorSizes[innerDimPos[index]] =
-llvm::divideCeil(readVectorSizes[i

[llvm-branch-commits] [mlir] [mlir][linalg] Enable scalable vectorization of linalg.unpack (PR #149293)

2025-07-31 Thread Han-Chung Wang via llvm-branch-commits

https://github.com/hanhanW edited 
https://github.com/llvm/llvm-project/pull/149293
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Enable scalable vectorization of linalg.unpack (PR #149293)

2025-07-31 Thread Han-Chung Wang via llvm-branch-commits

https://github.com/hanhanW approved this pull request.

LGTM, thanks! Just 3 nits.

https://github.com/llvm/llvm-project/pull/149293
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-24 Thread Han-Chung Wang via llvm-branch-commits


@@ -150,9 +150,15 @@ def Linalg_PackOp : Linalg_RelayoutOp<"pack", [
 
 `padding_value` specifies a padding value at the boundary on non-perfectly
 divisible dimensions. Padding is optional:
-- If absent, it is UB if the tile does not perfectly divide the dimension.
+- If absent, it is assumed that for all inner tiles,
+  `shape(source)[inner_dims_pos[i]] % inner_tiles[i] == 0`, i.e. all inner
+  tiles divide perfectly the corresponding outer dimension in the result
+  tensor.
 - If present, it will pad along high dimensions (high-padding) to make the
-  tile complete.
+  tile complete. Note that it is not allowed to have artificial padding 
that
+  is not strictly required by linalg.pack (i.e., padding past what is 
needed
+  to complete the last tile along each packed dimension). It is UB if extra
+  padding is requested.

hanhanW wrote:

Enforcing that is more like a dynamic check, but not a static check. I.e., it 
can only be asserted during runtime for most cases, IMO. E.g., you don't do any 
out-of-bound checks for tensor.extract_slice for dynamic sizes/dims, but you do 
out-of-bound checks when the op has static sizes and offsets.

https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-24 Thread Han-Chung Wang via llvm-branch-commits

https://github.com/hanhanW updated 
https://github.com/llvm/llvm-project/pull/149624

>From 6de929abea2ad4fac10605e333050b64eedbb904 Mon Sep 17 00:00:00 2001
From: hanhanW 
Date: Tue, 22 Jul 2025 15:15:08 -0700
Subject: [PATCH 1/4] [mlir][linalg] Restrict linalg.pack to not have extra
 padding sizes.

Signed-off-by: hanhanW 
---
 .../Dialect/Linalg/IR/LinalgRelayoutOps.td| 14 +++-
 mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp  | 28 ++-
 .../Transforms/PackAndUnpackPatterns.cpp  |  1 +
 mlir/test/Dialect/Linalg/canonicalize.mlir| 25 +++---
 .../Linalg/data-layout-propagation.mlir   | 18 -
 mlir/test/Dialect/Linalg/invalid.mlir | 17 ++--
 .../Dialect/Linalg/transform-lower-pack.mlir  | 16 ++--
 .../tile-and-fuse-consumer.mlir   | 81 ---
 8 files changed, 50 insertions(+), 150 deletions(-)

diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td 
b/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
index fa572024ff72b..f5dd7ae2c84f3 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgRelayoutOps.td
@@ -150,9 +150,10 @@ def Linalg_PackOp : Linalg_RelayoutOp<"pack", [
 
 `padding_value` specifies a padding value at the boundary on non-perfectly
 divisible dimensions. Padding is optional:
-- If absent, it is UB if the tile does not perfectly divide the dimension.
+- If absent, it assumes the tile perfectly divides the dimension.
 - If present, it will pad along high dimensions (high-padding) to make the
-  tile complete.
+  tile complete. Note that it is not allowed to have artificial padding 
that
+  is not strictly required by linalg.pack.
 
 Example:
 ```mlir
@@ -167,6 +168,15 @@ def Linalg_PackOp : Linalg_RelayoutOp<"pack", [
 //
 // Note: Only tiled dimensions can be padded.
 ```
+
+Invalid example that has artificial padding:
+```mlir
+%0 = linalg.pack %src padding_value(%cst : f32) inner_dims_pos = [0]
+inner_tiles = [8] into %dest
+: tensor<9xf32> -> tensor<3x8xf32>
+// \
+//expect tensor<2x8xf32> because CeilDiv(9, 8) = 2
+```
   }];
   let arguments = (ins AnyRankedTensor:$source,
AnyRankedTensor:$dest,
diff --git a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp 
b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
index 046a73c90f110..2d473daa1e1f4 100644
--- a/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
+++ b/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
@@ -32,6 +32,7 @@
 #include "mlir/IR/OpImplementation.h"
 #include "mlir/IR/OperationSupport.h"
 #include "mlir/IR/PatternMatch.h"
+#include "mlir/IR/TypeUtilities.h"
 #include "mlir/Interfaces/InferTypeOpInterface.h"
 #include "mlir/Interfaces/SideEffectInterfaces.h"
 
@@ -4622,22 +4623,6 @@ static bool 
isInvalidPackingPosSpecification(ArrayRef dimsPos,
   });
 }
 
-/// Returns true if the dimension of `sourceShape` is smaller than the 
dimension
-/// of the `limitShape`.
-static bool areAllInBound(ArrayRef sourceShape,
-  ArrayRef limitShape) {
-  assert(
-  sourceShape.size() == limitShape.size() &&
-  "expected source shape rank, and limit of the shape to have same rank");
-  return llvm::all_of(
-  llvm::zip(sourceShape, limitShape), [](std::tuple it) {
-int64_t sourceExtent = std::get<0>(it);
-int64_t limit = std::get<1>(it);
-return ShapedType::isDynamic(sourceExtent) ||
-   ShapedType::isDynamic(limit) || sourceExtent <= limit;
-  });
-}
-
 template 
 static LogicalResult commonVerifierPackAndUnPackOp(OpTy packOrUnPack) {
   static_assert(llvm::is_one_of::value,
@@ -4696,11 +4681,6 @@ static LogicalResult commonVerifierPackAndUnPackOp(OpTy 
packOrUnPack) {
   // represents full tiles.
   RankedTensorType expectedPackedType = PackOp::inferPackedType(
   unpackedType, packOrUnPack.getStaticTiles(), innerDimsPos, outerDimPerm);
-  if (!areAllInBound(expectedPackedType.getShape(), packedType.getShape())) {
-return op->emitError("the shape of output is not large enough to hold the "
- "packed data. Expected at least ")
-   << expectedPackedType << ", got " << packedType;
-  }
   if (!llvm::all_of(
   llvm::zip(packedType.getShape().take_back(mixedTiles.size()),
 mixedTiles),
@@ -4717,6 +4697,12 @@ static LogicalResult commonVerifierPackAndUnPackOp(OpTy 
packOrUnPack) {
 return op->emitError("mismatch in inner tile sizes specified and shaped of 
"
  "tiled dimension in the packed type");
   }
+  if (failed(verifyCompatibleShape(expectedPackedType.getShape(),
+   packedType.getShape( {
+return op->emitError("the shape of unpacked domain value is not large "
+ "enough to hold the packed data. Expected at least ")
+   << expectedPackedTy

[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-24 Thread Han-Chung Wang via llvm-branch-commits




hanhanW wrote:

Yes, updated. Thanks for pointing it out!

https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-24 Thread Han-Chung Wang via llvm-branch-commits


@@ -4717,6 +4697,12 @@ static LogicalResult commonVerifierPackAndUnPackOp(OpTy 
packOrUnPack) {
 return op->emitError("mismatch in inner tile sizes specified and shaped of 
"
  "tiled dimension in the packed type");
   }
+  if (failed(verifyCompatibleShape(expectedPackedType.getShape(),
+   packedType.getShape( {
+return op->emitError("expected ")
+   << expectedPackedType << " for the unpacked domain value, got "

hanhanW wrote:

good catch, thanks!

https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-24 Thread Han-Chung Wang via llvm-branch-commits


@@ -1824,27 +1825,47 @@ func.func @unpack_invalid_outer_dims_perm(%source: 
tensor<128x256xf32>, %dest: t
 
 // -
 
+func.func @pack_with_artificial_padding(%input: tensor<9xf32>, %output: 
tensor<3x8xf32>) -> tensor<3x8xf32> {
+  %cst = arith.constant 0.0 : f32
+  // expected-error@+1 {{expected 'tensor<2x8xf32>' for the unpacked domain 
value, got 'tensor<3x8xf32>'}}
+  %0 = linalg.pack %input padding_value(%cst : f32) inner_dims_pos = [0]
+  inner_tiles = [8] into %output
+  : tensor<9xf32> -> tensor<3x8xf32>
+  return %0 : tensor<3x8xf32>
+}
+
+// -
+
 // The outer dims in the output tensor are incorrectly/unexpectedly transposed.
 // This could be fixed by adding `outer_dims_perm = [1, 0]` (the default value 
assumes no transpose).
 func.func @pack_invalid_result_shape(%input: tensor<256x128xf32>, %output: 
tensor<4x16x32x16xf32>) -> tensor<4x16x32x16xf32> {
-  // expected-error@+1 {{the shape of output is not large enough to hold the 
packed data. Expected at least 'tensor<16x4x32x16xf32>', got 
'tensor<4x16x32x16xf32>'}}
+  // expected-error@+1 {{expected 'tensor<16x4x32x16xf32>' for the unpacked 
domain value, got 'tensor<4x16x32x16xf32>'}}
   %0 = linalg.pack %input inner_dims_pos = [1, 0] inner_tiles = [32, 16] into 
%output : tensor<256x128xf32> -> tensor<4x16x32x16xf32>
   return %0 : tensor<4x16x32x16xf32>
 }
 
 // -
 
-func.func @pack_invalid(%input: tensor<256x128xf32>, %output: 
tensor<8x8x32x16xf32>) -> tensor<8x8x32x16xf32> {
-  // expected-error@+1 {{the shape of output is not large enough to hold the 
packed data. Expected at least 'tensor<8x8x16x32xf32>', got 
'tensor<8x8x32x16xf32>'}}
-  %0 = linalg.pack %input inner_dims_pos = [1, 0] inner_tiles = [16, 32] into 
%output : tensor<256x128xf32> -> tensor<8x8x32x16xf32>
-  return %0 : tensor<8x8x32x16xf32>
+func.func @pack_invalid(%input: tensor<256x128xf32>, %output: 
tensor<8x7x16x32xf32>) -> tensor<8x7x16x32xf32> {
+  // expected-error@+1 {{expected 'tensor<8x8x16x32xf32>' for the unpacked 
domain value, got 'tensor<8x7x16x32xf32>'}}
+  %0 = linalg.pack %input inner_dims_pos = [1, 0] inner_tiles = [16, 32] into 
%output : tensor<256x128xf32> -> tensor<8x7x16x32xf32>
+  return %0 : tensor<8x7x16x32xf32>
+}
+
+// -
+
+func.func @unpack_with_slicing_tiles(%input: tensor<3x8xf32>, %output: 
tensor<9xf32>) -> tensor<9xf32> {

hanhanW wrote:

SG, let's use `artificial` for consistency.

https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [mlir][linalg] Restrict linalg.pack to not have artificial padding. (PR #149624)

2025-07-24 Thread Han-Chung Wang via llvm-branch-commits


@@ -150,9 +150,15 @@ def Linalg_PackOp : Linalg_RelayoutOp<"pack", [
 
 `padding_value` specifies a padding value at the boundary on non-perfectly
 divisible dimensions. Padding is optional:
-- If absent, it is UB if the tile does not perfectly divide the dimension.
+- If absent, it is assumed that for all inner tiles,
+  `shape(source)[inner_dims_pos[i]] % inner_tiles[i] == 0`, i.e. all inner
+  tiles divide perfectly the corresponding outer dimension in the result
+  tensor.
 - If present, it will pad along high dimensions (high-padding) to make the
-  tile complete.
+  tile complete. Note that it is not allowed to have artificial padding 
that
+  is not strictly required by linalg.pack (i.e., padding past what is 
needed
+  to complete the last tile along each packed dimension). It is UB if extra
+  padding is requested.

hanhanW wrote:

Restored the UB for the previous point, because it also happens in dynamic 
shape that I forgot. I also added one more sentence about why they are UB. Does 
it look better?

https://github.com/llvm/llvm-project/pull/149624
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits