[llvm-branch-commits] [mlir] Users/hsiangkai/winograd ops transform (PR #96177)
https://github.com/Hsiangkai created https://github.com/llvm/llvm-project/pull/96177 None >From 276ed8981c5243696da3bf233a777e1b84f11131 Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:24:07 +0100 Subject: [PATCH 1/2] [mlir][linalg] Implement Conv2D using Winograd Conv2D algorithm Define high level winograd operators and convert conv_2d_nhwc_fhwc into winograd operators. According to Winograd Conv2D algorithm, we need three transform operators for input, filter, and output transformation. The formula of Winograd Conv2D algorithm is Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A filter transform: G x g x G^T input transform: B^T x d x B output transform: A^T x y x A The implementation is based on the paper, Fast Algorithm for Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308) --- .../mlir/Dialect/Linalg/IR/LinalgOps.td | 114 +++ .../Dialect/Linalg/Transforms/Transforms.h| 4 + mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 78 + .../Dialect/Linalg/Transforms/CMakeLists.txt | 1 + .../Linalg/Transforms/WinogradConv2D.cpp | 321 ++ mlir/test/Dialect/Linalg/winograd-conv2d.mlir | 248 ++ .../Dialect/Linalg/TestLinalgTransforms.cpp | 13 + 7 files changed, 779 insertions(+) create mode 100644 mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp create mode 100644 mlir/test/Dialect/Linalg/winograd-conv2d.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td index 64c538367267d..de1097b6ac27b 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -154,4 +154,118 @@ def Linalg_SoftmaxOp : Linalg_Op<"softmax", let hasVerifier = 1; } +def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { + let summary = "Winograd filter transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of filter +transformation (G x g x G^T) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$filter, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $filter `:` type($filter) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { + let summary = "Winograd input transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of input +transformation (B^T x d x B) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$input, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $input `:` type($input) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform"> { + let summary = "Winograd output transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96177)
https://github.com/Hsiangkai edited https://github.com/llvm/llvm-project/pull/96177 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96177)
https://github.com/Hsiangkai edited https://github.com/llvm/llvm-project/pull/96177 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96177)
https://github.com/Hsiangkai updated https://github.com/llvm/llvm-project/pull/96177 >From 276ed8981c5243696da3bf233a777e1b84f11131 Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:24:07 +0100 Subject: [PATCH 1/2] [mlir][linalg] Implement Conv2D using Winograd Conv2D algorithm Define high level winograd operators and convert conv_2d_nhwc_fhwc into winograd operators. According to Winograd Conv2D algorithm, we need three transform operators for input, filter, and output transformation. The formula of Winograd Conv2D algorithm is Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A filter transform: G x g x G^T input transform: B^T x d x B output transform: A^T x y x A The implementation is based on the paper, Fast Algorithm for Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308) --- .../mlir/Dialect/Linalg/IR/LinalgOps.td | 114 +++ .../Dialect/Linalg/Transforms/Transforms.h| 4 + mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 78 + .../Dialect/Linalg/Transforms/CMakeLists.txt | 1 + .../Linalg/Transforms/WinogradConv2D.cpp | 321 ++ mlir/test/Dialect/Linalg/winograd-conv2d.mlir | 248 ++ .../Dialect/Linalg/TestLinalgTransforms.cpp | 13 + 7 files changed, 779 insertions(+) create mode 100644 mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp create mode 100644 mlir/test/Dialect/Linalg/winograd-conv2d.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td index 64c538367267d..de1097b6ac27b 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -154,4 +154,118 @@ def Linalg_SoftmaxOp : Linalg_Op<"softmax", let hasVerifier = 1; } +def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { + let summary = "Winograd filter transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of filter +transformation (G x g x G^T) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$filter, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $filter `:` type($filter) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { + let summary = "Winograd input transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of input +transformation (B^T x d x B) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$input, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $input `:` type($input) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform"> { + let summary = "Winograd output transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r)
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96177)
https://github.com/Hsiangkai updated https://github.com/llvm/llvm-project/pull/96177 >From 276ed8981c5243696da3bf233a777e1b84f11131 Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:24:07 +0100 Subject: [PATCH 1/2] [mlir][linalg] Implement Conv2D using Winograd Conv2D algorithm Define high level winograd operators and convert conv_2d_nhwc_fhwc into winograd operators. According to Winograd Conv2D algorithm, we need three transform operators for input, filter, and output transformation. The formula of Winograd Conv2D algorithm is Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A filter transform: G x g x G^T input transform: B^T x d x B output transform: A^T x y x A The implementation is based on the paper, Fast Algorithm for Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308) --- .../mlir/Dialect/Linalg/IR/LinalgOps.td | 114 +++ .../Dialect/Linalg/Transforms/Transforms.h| 4 + mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 78 + .../Dialect/Linalg/Transforms/CMakeLists.txt | 1 + .../Linalg/Transforms/WinogradConv2D.cpp | 321 ++ mlir/test/Dialect/Linalg/winograd-conv2d.mlir | 248 ++ .../Dialect/Linalg/TestLinalgTransforms.cpp | 13 + 7 files changed, 779 insertions(+) create mode 100644 mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp create mode 100644 mlir/test/Dialect/Linalg/winograd-conv2d.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td index 64c538367267d..de1097b6ac27b 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -154,4 +154,118 @@ def Linalg_SoftmaxOp : Linalg_Op<"softmax", let hasVerifier = 1; } +def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { + let summary = "Winograd filter transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of filter +transformation (G x g x G^T) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$filter, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $filter `:` type($filter) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { + let summary = "Winograd input transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of input +transformation (B^T x d x B) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$input, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $input `:` type($input) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform"> { + let summary = "Winograd output transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r)
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96177)
https://github.com/Hsiangkai updated https://github.com/llvm/llvm-project/pull/96177 >From 0c542404842679a5b9653a9a1049fb765245692e Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:24:07 +0100 Subject: [PATCH 1/2] [mlir][linalg] Implement Conv2D using Winograd Conv2D algorithm Define high level winograd operators and convert conv_2d_nhwc_fhwc into winograd operators. According to Winograd Conv2D algorithm, we need three transform operators for input, filter, and output transformation. The formula of Winograd Conv2D algorithm is Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A filter transform: G x g x G^T input transform: B^T x d x B output transform: A^T x y x A The implementation is based on the paper, Fast Algorithm for Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308) --- .../mlir/Dialect/Linalg/IR/LinalgOps.td | 114 +++ .../Dialect/Linalg/Transforms/Transforms.h| 4 + mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 78 + .../Dialect/Linalg/Transforms/CMakeLists.txt | 1 + .../Linalg/Transforms/WinogradConv2D.cpp | 321 ++ mlir/test/Dialect/Linalg/winograd-conv2d.mlir | 248 ++ .../Dialect/Linalg/TestLinalgTransforms.cpp | 13 + 7 files changed, 779 insertions(+) create mode 100644 mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp create mode 100644 mlir/test/Dialect/Linalg/winograd-conv2d.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td index 64c538367267d..de1097b6ac27b 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -154,4 +154,118 @@ def Linalg_SoftmaxOp : Linalg_Op<"softmax", let hasVerifier = 1; } +def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { + let summary = "Winograd filter transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of filter +transformation (G x g x G^T) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$filter, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $filter `:` type($filter) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { + let summary = "Winograd input transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of input +transformation (B^T x d x B) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$input, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $input `:` type($input) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform"> { + let summary = "Winograd output transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r)
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96177)
https://github.com/Hsiangkai updated https://github.com/llvm/llvm-project/pull/96177 >From 0c542404842679a5b9653a9a1049fb765245692e Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:24:07 +0100 Subject: [PATCH 1/2] [mlir][linalg] Implement Conv2D using Winograd Conv2D algorithm Define high level winograd operators and convert conv_2d_nhwc_fhwc into winograd operators. According to Winograd Conv2D algorithm, we need three transform operators for input, filter, and output transformation. The formula of Winograd Conv2D algorithm is Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A filter transform: G x g x G^T input transform: B^T x d x B output transform: A^T x y x A The implementation is based on the paper, Fast Algorithm for Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308) --- .../mlir/Dialect/Linalg/IR/LinalgOps.td | 114 +++ .../Dialect/Linalg/Transforms/Transforms.h| 4 + mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 78 + .../Dialect/Linalg/Transforms/CMakeLists.txt | 1 + .../Linalg/Transforms/WinogradConv2D.cpp | 321 ++ mlir/test/Dialect/Linalg/winograd-conv2d.mlir | 248 ++ .../Dialect/Linalg/TestLinalgTransforms.cpp | 13 + 7 files changed, 779 insertions(+) create mode 100644 mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp create mode 100644 mlir/test/Dialect/Linalg/winograd-conv2d.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td index 64c538367267d..de1097b6ac27b 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -154,4 +154,118 @@ def Linalg_SoftmaxOp : Linalg_Op<"softmax", let hasVerifier = 1; } +def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { + let summary = "Winograd filter transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of filter +transformation (G x g x G^T) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$filter, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $filter `:` type($filter) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { + let summary = "Winograd input transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of input +transformation (B^T x d x B) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$input, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $input `:` type($input) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform"> { + let summary = "Winograd output transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r)
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96182)
https://github.com/Hsiangkai created https://github.com/llvm/llvm-project/pull/96182 Add a transform operator structured.winograd_conv2d to convert linalg.conv_2d_nhwc_fhwc to Linalg winograd operators. >From a3d188ed7d25df05ccd6bc227ddc361b0c66a2f4 Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:24:07 +0100 Subject: [PATCH 1/2] [mlir][linalg] Implement Conv2D using Winograd Conv2D algorithm Define high level winograd operators and convert conv_2d_nhwc_fhwc into winograd operators. According to Winograd Conv2D algorithm, we need three transform operators for input, filter, and output transformation. The formula of Winograd Conv2D algorithm is Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A filter transform: G x g x G^T input transform: B^T x d x B output transform: A^T x y x A The implementation is based on the paper, Fast Algorithm for Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308) --- .../mlir/Dialect/Linalg/IR/LinalgOps.td | 114 +++ .../Dialect/Linalg/Transforms/Transforms.h| 4 + mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 78 + .../Dialect/Linalg/Transforms/CMakeLists.txt | 1 + .../Linalg/Transforms/WinogradConv2D.cpp | 321 ++ mlir/test/Dialect/Linalg/winograd-conv2d.mlir | 248 ++ .../Dialect/Linalg/TestLinalgTransforms.cpp | 13 + 7 files changed, 779 insertions(+) create mode 100644 mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp create mode 100644 mlir/test/Dialect/Linalg/winograd-conv2d.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td index 64c538367267d..de1097b6ac27b 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -154,4 +154,118 @@ def Linalg_SoftmaxOp : Linalg_Op<"softmax", let hasVerifier = 1; } +def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { + let summary = "Winograd filter transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of filter +transformation (G x g x G^T) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$filter, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $filter `:` type($filter) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { + let summary = "Winograd input transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + +This operator is defined to represent the high level concept of input +transformation (B^T x d x B) in the Winograd Conv2D algorithm. + }]; + + let arguments = (ins AnyRankedTensor:$input, + AnyRankedTensor:$output, + I64Attr:$m, + I64Attr:$r + ); + + let results = (outs AnyRankedTensor:$result); + let assemblyFormat = [{ +attr-dict +`m` `(` $m `)` +`r` `(` $r `)` +`ins` `(` $input `:` type($input) `)` +`outs` `(` $output `:` type($output) `)` +`->` type($result) + }]; + let hasVerifier = 1; +} + +def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform"> { + let summary = "Winograd output transform operator"; + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply.
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96177)
Hsiangkai wrote: Sorry, I am still figuring out how to create stack PRs. https://github.com/llvm/llvm-project/pull/96177 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96178)
Hsiangkai wrote: Sorry, I am still figuring out how to create stack PRs. https://github.com/llvm/llvm-project/pull/96178 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96178)
https://github.com/Hsiangkai closed https://github.com/llvm/llvm-project/pull/96178 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96177)
https://github.com/Hsiangkai closed https://github.com/llvm/llvm-project/pull/96177 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Implement TilingInterface for winograd operators (PR #96179)
https://github.com/Hsiangkai closed https://github.com/llvm/llvm-project/pull/96179 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Implement TilingInterface for winograd operators (PR #96179)
Hsiangkai wrote: Sorry, I am still figuring out how to create stack PRs. https://github.com/llvm/llvm-project/pull/96179 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Implement TilingInterface for winograd operators (PR #96184)
https://github.com/Hsiangkai created https://github.com/llvm/llvm-project/pull/96184 In order to support arbitrary size input data of conv2d, implement TilingInterface for winograd operators. Before converting winograd operators into nested loops with matrix multiply, tile the input of conv2d into the supported size first. Add a transform operator structured.decompose_winograd_op to decompose winograd operators. Before applying the transform op, use tile_using_for to tile the input data into supported size. The test case shows how to tile and decompose winograd operators. >From 7300578082fb321a0617ed2b61202eca39989e59 Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:44:27 +0100 Subject: [PATCH] [mlir][linalg] Implement TilingInterface for winograd operators In order to support arbitrary size input data of conv2d, implement TilingInterface for winograd operators. Before converting winograd operators into nested loops with matrix multiply, tile the input of conv2d into the supported size first. Add a transform operator structured.decompose_winograd_op to decompose winograd operators. Before applying the transform op, use tile_using_for to tile the input data into supported size. The test case shows how to tile and decompose winograd operators. --- .../mlir/Dialect/Linalg/IR/LinalgOps.td | 21 +- .../Linalg/TransformOps/LinalgTransformOps.td | 37 ++ .../Dialect/Linalg/Transforms/Transforms.h| 45 +++ mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 281 +++ .../TransformOps/LinalgTransformOps.cpp | 27 ++ .../Linalg/Transforms/WinogradConv2D.cpp | 18 + .../transform-tile-and-winograd-rewrite.mlir | 332 ++ 7 files changed, 758 insertions(+), 3 deletions(-) create mode 100644 mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td index de1097b6ac27b..45726d6ee2224 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -154,7 +154,12 @@ def Linalg_SoftmaxOp : Linalg_Op<"softmax", let hasVerifier = 1; } -def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { +def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform", +[DeclareOpInterfaceMethods]> { let summary = "Winograd filter transform operator"; let description = [{ Winograd Conv2D algorithm will convert linalg Conv2D operator into batched @@ -192,7 +197,12 @@ def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { let hasVerifier = 1; } -def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { +def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform", +[DeclareOpInterfaceMethods]> { let summary = "Winograd input transform operator"; let description = [{ Winograd Conv2D algorithm will convert linalg Conv2D operator into batched @@ -230,7 +240,12 @@ def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { let hasVerifier = 1; } -def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform"> { +def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform", +[DeclareOpInterfaceMethods]> { let summary = "Winograd output transform operator"; let description = [{ Winograd Conv2D algorithm will convert linalg Conv2D operator into batched diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td index 68d0f713caad4..71736eae38b4f 100644 --- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td +++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td @@ -2638,4 +2638,41 @@ def WinogradConv2DOp : Op { + let description = [{ +Decompose winograd operators. It will convert filter, input and output +transform operators into a combination of scf, tensor, and linalg +equivalent operators. Before applying this transform operator, users +need to tile winograd transform operators into supported sizes. + + Return modes: + +This operation fails if `target` is unsupported. Otherwise, the operation +succeeds and returns a handle of the sequence that replaces the original +operator. + }]; + + let arguments = (ins TransformHandleTypeInterface:$target); + let results = (outs TransformHandleTypeInterface:$transformed); + + let assemblyFormat = +"$target attr-dict `:` functional-type($target, results)"; + + let builders = [ +OpBuilder<(ins "Value":$target)> + ]; + + let extraClassDeclaration = [{ +::mlir::DiagnosedSilenceableFailure applyToOne( +::mlir::transform::TransformRewriter &rewriter, +::mlir::Operation *target, +::mlir::transform::ApplyToEachResultList &results, +
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96182)
https://github.com/Hsiangkai updated https://github.com/llvm/llvm-project/pull/96182 >From 374b0d5b83ce080bea690199380e270a36ad1c52 Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:49:08 +0100 Subject: [PATCH] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm Add a transform operator structured.winograd_conv2d to convert linalg.conv_2d_nhwc_fhwc to Linalg winograd operators. --- .../Linalg/TransformOps/LinalgTransformOps.td | 51 +++ .../Dialect/Linalg/Transforms/Transforms.h| 7 ++ .../TransformOps/LinalgTransformOps.cpp | 25 ++ .../Linalg/Transforms/WinogradConv2D.cpp | 6 ++ .../Linalg/transform-winograd-conv2d.mlir | 88 +++ 5 files changed, 177 insertions(+) create mode 100644 mlir/test/Dialect/Linalg/transform-winograd-conv2d.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td index 93e2c2db729da..68d0f713caad4 100644 --- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td +++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td @@ -2587,4 +2587,55 @@ def MapCopyToThreadsOp : }]; } +//===--===// +// Winograd Conv2D +//===--===// + +def WinogradConv2DOp : Op { + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + + Return modes: + +This operation fails if `target` is unsupported. Otherwise, the operation +succeeds and returns a handle of the sequence that replaces the original +convolution. + }]; + + let arguments = (ins TransformHandleTypeInterface:$target, + I64Attr:$m, + I64Attr:$r); + let results = (outs TransformHandleTypeInterface:$transformed); + + let assemblyFormat = +"$target attr-dict `:` functional-type($target, results)"; + + let builders = [ +OpBuilder<(ins "Value":$target)> + ]; + + let extraClassDeclaration = [{ +::mlir::DiagnosedSilenceableFailure applyToOne( +::mlir::transform::TransformRewriter &rewriter, +::mlir::linalg::LinalgOp target, +::mlir::transform::ApplyToEachResultList &results, +::mlir::transform::TransformState &state); + }]; +} + #endif // LINALG_TRANSFORM_OPS diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h index 835aeaf2ffed3..da107b66257a5 100644 --- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h +++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h @@ -1312,6 +1312,13 @@ FailureOr transposeBatchMatmul(RewriterBase &rewriter, linalg::BatchMatmulOp op, bool transposeLHS = true); +/// Convert linalg.conv_2d_nhwc_fhwc to Winograd Conv2D algorithm +/// F(m x m, r x r). m is the dimension size of output and r is the dimension +/// size of filter. +FailureOr winogradConv2D(RewriterBase &rewriter, + linalg::Conv2DNhwcFhwcOp op, int64_t m, + int64_t r); + //===--===// // Rewrite patterns wrapping transformations. // TODO: every single such pattern should be a close to noop wrapper around a diff --git a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp index bc02788f9c441..d051b29e1f06f 100644 --- a/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp +++ b/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp @@ -3480,6 +3480,31 @@ DiagnosedSilenceableFailure transform::MapCopyToThreadsOp::applyToOne( return DiagnosedSilenceableFailure::success(); } +//===--===// +// WinogradConv2DOp +//===--===// + +DiagnosedSilenceableFailure transform::WinogradConv2DOp::applyToOne( +transform::TransformRewriter &rewriter, linalg::LinalgOp target, +transform::ApplyToEachResultList &results, +transform::TransformState &state) { + rewriter.setInsertionPoint(target); + auto maybeTransfo
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
https://github.com/Hsiangkai updated https://github.com/llvm/llvm-project/pull/96183 >From 24c4f957ae673c2955fc0674f91e488813d59350 Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 17:39:49 +0100 Subject: [PATCH] [mlir][linalg] Decompose winograd operators Convert Linalg winograd_filter_transform, winograd_input_transform, and winograd_output_transform into nested loops with matrix multiplication with constant transform matrices. Support several configurations of Winograd Conv2D, including F(2, 3), F(4, 3) and F(2, 5). These configurations show that the implementation can support different kernel size (3 and 5) and different output size (2 and 4). Besides symetric kernel size 3x3 and 5x5, this patch also supports 1x3, 3x1, 1x5, and 5x1 kernels. The implementation is based on the paper, Fast Algorithm for Convolutional Neural Networks. (https://arxiv.org/abs/1509.09308) --- .../Dialect/Linalg/Transforms/Transforms.h| 3 + .../Linalg/Transforms/WinogradConv2D.cpp | 773 ++ .../Linalg/winograd-conv2d-rewrite.mlir | 105 +++ .../Dialect/Linalg/TestLinalgTransforms.cpp | 11 + 4 files changed, 892 insertions(+) create mode 100644 mlir/test/Dialect/Linalg/winograd-conv2d-rewrite.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h index da107b66257a5..bb7ec590faad0 100644 --- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h +++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h @@ -1703,6 +1703,9 @@ void populateBlockPackMatmulPatterns(RewritePatternSet &patterns, void populateWinogradConv2DPatterns(RewritePatternSet &patterns, int64_t m, int64_t r); +/// Patterns to decompose Winograd operators. +void populateDecomposeWinogradOpsPatterns(RewritePatternSet &patterns); + } // namespace linalg } // namespace mlir diff --git a/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp b/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp index d1f4be8bbf29a..d245723c85646 100644 --- a/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp +++ b/mlir/lib/Dialect/Linalg/Transforms/WinogradConv2D.cpp @@ -12,7 +12,10 @@ // //===--===// +#include "mlir/Dialect/Affine/IR/AffineOps.h" +#include "mlir/Dialect/Arith/IR/Arith.h" #include "mlir/Dialect/Linalg/IR/Linalg.h" +#include "mlir/Dialect/Linalg/Transforms/Transforms.h" #include "mlir/Dialect/Tensor/IR/Tensor.h" #include "mlir/Dialect/Tosa/Utils/ConversionUtils.h" #include "mlir/Transforms/GreedyPatternRewriteDriver.h" @@ -23,6 +26,156 @@ namespace linalg { namespace { +// clang-format off +// Winograd Conv2D uses a minimal 2D filtering algorithm to calculate its +// result. The formula of minimal 2D filtering algorithm F(m x m, r x r), +// m is the output dimension and r is the filter dimension, is +// +// Y = A^T x [ (G x g x G^T) x (B^T x d x B) ] x A +// +// g is filter and d is input data. We need to prepare 6 constant +// transformation matrices, G, G^T, B^T, B, A^T, and A for this formula. +// +// The following tables define these constant transformation matrices for +// F(2 x 2, 3 x 3), F(4 x 4, 3 x 3), and F(2 x 2, 5 x 5) +constexpr float G_2x2_3x3[] = { + -1, 0, 0, + 1./2, -1./2, 1./2, + 1./2, 1./2, 1./2, +0, 0,1 +}; + +constexpr float GT_2x2_3x3[] = { + -1, 1./2, 1./2, 0, +0, -1./2, 1./2, 0, +0, 1./2, 1./2, 1 +}; + +constexpr float BT_2x2_3x3[] = { + -1,0, 1, 0, +0, -1, 1, 0, +0,1, 1, 0, +0, -1, 0, 1 +}; + +constexpr float B_2x2_3x3[] = { + -1,0, 0, 0, +0, -1, 1, -1, +1,1, 1, 0, +0,0, 0, 1 +}; + +constexpr float AT_2x2_3x3[] = { +1,1, 1, 0, +0, -1, 1, 1 +}; + +constexpr float A_2x2_3x3[] = { +1,0, +1, -1, +1,1, +0,1 +}; + +constexpr float G_4x4_3x3[] = { + 1, 0, 0, + -1./3, 1./3, -1./3, + -1./3, -1./3, -1./3, + 1./12, -1./6, 1./3, + 1./12, 1./6, 1./3, + 0, 0, 1 +}; + +constexpr float GT_4x4_3x3[] = { + 1, -1./3, -1./3, 1./12, 1./12, 0, + 0, 1./3, -1./3, -1./6, 1./6, 0, + 0, -1./3, -1./3, 1./3, 1./3, 1 +}; + +constexpr float BT_4x4_3x3[] = { + 1./4, 0, -5./16, 0, 1./16, 0, +0, 1./4, -1./4, -1./16, 1./16, 0, +0, -1./4, -1./4, 1./16, 1./16, 0, +0, 1./4, -1./8, -1./4, 1./8, 0, +0, -1./4, -1./8, 1./4, 1./8, 0, +0, 1./4, 0, -5./16, 0, 1./16 +}; + +constexpr float B_4x4_3x3[] = { + 1./4, 0, 0, 0, 0, 0, + 0, 1./4, -1./4, 1./4, -1./4, 1./4, + -5./16, -1./4, -1./4, -1./8, -1./8, 0, + 0, -1./16, 1./16, -1./4, 1./4, -5./16, + 1./16, 1./16, 1./16, 1./8, 1./8, 0, + 0, 0, 0, 0, 0, 1./16 +}; + +constexpr float AT_4x4_3x3[] = { + 1./8, 1./4,
[llvm-branch-commits] [mlir] [mlir][linalg] Implement TilingInterface for winograd operators (PR #96184)
https://github.com/Hsiangkai updated https://github.com/llvm/llvm-project/pull/96184 >From 73b524b7746839614655fd8082dbda297e93ba72 Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:44:27 +0100 Subject: [PATCH] [mlir][linalg] Implement TilingInterface for winograd operators In order to support arbitrary size input data of conv2d, implement TilingInterface for winograd operators. Before converting winograd operators into nested loops with matrix multiply, tile the input of conv2d into the supported size first. Add a transform operator structured.decompose_winograd_op to decompose winograd operators. Before applying the transform op, use tile_using_for to tile the input data into supported size. The test case shows how to tile and decompose winograd operators. --- .../mlir/Dialect/Linalg/IR/LinalgOps.td | 21 +- .../Linalg/TransformOps/LinalgTransformOps.td | 37 ++ .../Dialect/Linalg/Transforms/Transforms.h| 45 +++ mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 281 +++ .../TransformOps/LinalgTransformOps.cpp | 27 ++ .../Linalg/Transforms/WinogradConv2D.cpp | 18 + .../transform-tile-and-winograd-rewrite.mlir | 332 ++ 7 files changed, 758 insertions(+), 3 deletions(-) create mode 100644 mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td index de1097b6ac27b..45726d6ee2224 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -154,7 +154,12 @@ def Linalg_SoftmaxOp : Linalg_Op<"softmax", let hasVerifier = 1; } -def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { +def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform", +[DeclareOpInterfaceMethods]> { let summary = "Winograd filter transform operator"; let description = [{ Winograd Conv2D algorithm will convert linalg Conv2D operator into batched @@ -192,7 +197,12 @@ def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { let hasVerifier = 1; } -def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { +def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform", +[DeclareOpInterfaceMethods]> { let summary = "Winograd input transform operator"; let description = [{ Winograd Conv2D algorithm will convert linalg Conv2D operator into batched @@ -230,7 +240,12 @@ def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { let hasVerifier = 1; } -def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform"> { +def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform", +[DeclareOpInterfaceMethods]> { let summary = "Winograd output transform operator"; let description = [{ Winograd Conv2D algorithm will convert linalg Conv2D operator into batched diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td index 68d0f713caad4..71736eae38b4f 100644 --- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td +++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td @@ -2638,4 +2638,41 @@ def WinogradConv2DOp : Op { + let description = [{ +Decompose winograd operators. It will convert filter, input and output +transform operators into a combination of scf, tensor, and linalg +equivalent operators. Before applying this transform operator, users +need to tile winograd transform operators into supported sizes. + + Return modes: + +This operation fails if `target` is unsupported. Otherwise, the operation +succeeds and returns a handle of the sequence that replaces the original +operator. + }]; + + let arguments = (ins TransformHandleTypeInterface:$target); + let results = (outs TransformHandleTypeInterface:$transformed); + + let assemblyFormat = +"$target attr-dict `:` functional-type($target, results)"; + + let builders = [ +OpBuilder<(ins "Value":$target)> + ]; + + let extraClassDeclaration = [{ +::mlir::DiagnosedSilenceableFailure applyToOne( +::mlir::transform::TransformRewriter &rewriter, +::mlir::Operation *target, +::mlir::transform::ApplyToEachResultList &results, +::mlir::transform::TransformState &state); + }]; +} + #endif // LINALG_TRANSFORM_OPS diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h index bb7ec590faad0..d0eec2be1f8fb 100644 --- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h +++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h @@ -1319,6 +1319,51 @@ FailureOr winogradConv2D(RewriterBase &rewriter, linal
[llvm-branch-commits] [mlir] [mlir][linalg] Implement TilingInterface for winograd operators (PR #96184)
https://github.com/Hsiangkai updated https://github.com/llvm/llvm-project/pull/96184 >From 73b524b7746839614655fd8082dbda297e93ba72 Mon Sep 17 00:00:00 2001 From: Hsiangkai Wang Date: Mon, 17 Jun 2024 11:44:27 +0100 Subject: [PATCH 1/2] [mlir][linalg] Implement TilingInterface for winograd operators In order to support arbitrary size input data of conv2d, implement TilingInterface for winograd operators. Before converting winograd operators into nested loops with matrix multiply, tile the input of conv2d into the supported size first. Add a transform operator structured.decompose_winograd_op to decompose winograd operators. Before applying the transform op, use tile_using_for to tile the input data into supported size. The test case shows how to tile and decompose winograd operators. --- .../mlir/Dialect/Linalg/IR/LinalgOps.td | 21 +- .../Linalg/TransformOps/LinalgTransformOps.td | 37 ++ .../Dialect/Linalg/Transforms/Transforms.h| 45 +++ mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | 281 +++ .../TransformOps/LinalgTransformOps.cpp | 27 ++ .../Linalg/Transforms/WinogradConv2D.cpp | 18 + .../transform-tile-and-winograd-rewrite.mlir | 332 ++ 7 files changed, 758 insertions(+), 3 deletions(-) create mode 100644 mlir/test/Dialect/Linalg/transform-tile-and-winograd-rewrite.mlir diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td index de1097b6ac27b..45726d6ee2224 100644 --- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td +++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td @@ -154,7 +154,12 @@ def Linalg_SoftmaxOp : Linalg_Op<"softmax", let hasVerifier = 1; } -def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { +def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform", +[DeclareOpInterfaceMethods]> { let summary = "Winograd filter transform operator"; let description = [{ Winograd Conv2D algorithm will convert linalg Conv2D operator into batched @@ -192,7 +197,12 @@ def Linalg_WinogradFilterTransformOp : Linalg_Op<"winograd_filter_transform"> { let hasVerifier = 1; } -def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { +def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform", +[DeclareOpInterfaceMethods]> { let summary = "Winograd input transform operator"; let description = [{ Winograd Conv2D algorithm will convert linalg Conv2D operator into batched @@ -230,7 +240,12 @@ def Linalg_WinogradInputTransformOp : Linalg_Op<"winograd_input_transform"> { let hasVerifier = 1; } -def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform"> { +def Linalg_WinogradOutputTransformOp : Linalg_Op<"winograd_output_transform", +[DeclareOpInterfaceMethods]> { let summary = "Winograd output transform operator"; let description = [{ Winograd Conv2D algorithm will convert linalg Conv2D operator into batched diff --git a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td index 68d0f713caad4..71736eae38b4f 100644 --- a/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td +++ b/mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td @@ -2638,4 +2638,41 @@ def WinogradConv2DOp : Op { + let description = [{ +Decompose winograd operators. It will convert filter, input and output +transform operators into a combination of scf, tensor, and linalg +equivalent operators. Before applying this transform operator, users +need to tile winograd transform operators into supported sizes. + + Return modes: + +This operation fails if `target` is unsupported. Otherwise, the operation +succeeds and returns a handle of the sequence that replaces the original +operator. + }]; + + let arguments = (ins TransformHandleTypeInterface:$target); + let results = (outs TransformHandleTypeInterface:$transformed); + + let assemblyFormat = +"$target attr-dict `:` functional-type($target, results)"; + + let builders = [ +OpBuilder<(ins "Value":$target)> + ]; + + let extraClassDeclaration = [{ +::mlir::DiagnosedSilenceableFailure applyToOne( +::mlir::transform::TransformRewriter &rewriter, +::mlir::Operation *target, +::mlir::transform::ApplyToEachResultList &results, +::mlir::transform::TransformState &state); + }]; +} + #endif // LINALG_TRANSFORM_OPS diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h index bb7ec590faad0..d0eec2be1f8fb 100644 --- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h +++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h @@ -1319,6 +1319,51 @@ FailureOr winogradConv2D(RewriterBase &rewriter, l
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96182)
@@ -0,0 +1,88 @@ +// RUN: mlir-opt %s -transform-interpreter -canonicalize --split-input-file | FileCheck %s + +func.func @conv2d(%arg0: tensor<2x10x10x5xf32>, %arg1: tensor<2x3x3x5xf32>, %arg2: tensor<1xf32>) -> tensor<2x8x8x2xf32> { + %0 = tensor.empty() : tensor<2x8x8x2xf32> + %1 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (0)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2 : tensor<1xf32>) outs(%0 : tensor<2x8x8x2xf32>) { + ^bb0(%in: f32, %out: f32): +linalg.yield %in : f32 + } -> tensor<2x8x8x2xf32> + %2 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%arg0, %arg1 : tensor<2x10x10x5xf32>, tensor<2x3x3x5xf32>) outs(%1 : tensor<2x8x8x2xf32>) -> tensor<2x8x8x2xf32> + return %2 : tensor<2x8x8x2xf32> +} + +module attributes {transform.with_named_sequence} { + transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { +%0 = transform.structured.match ops{["linalg.conv_2d_nhwc_fhwc"]} in %arg1 : (!transform.any_op) -> !transform.any_op +%1 = transform.structured.winograd_conv2d %0 { m = 4, r = 3 } : (!transform.any_op) -> (!transform.any_op) +transform.yield + } +} + +// CHECK: #[[$MAP0:.+]] = affine_map<(d0, d1, d2, d3) -> (0)> +// CHECK: #[[$MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> +// CHECK-LABEL: func.func @conv2d +// CHECK-SAME: (%[[ARG0:.*]]: tensor<2x10x10x5xf32>, %[[ARG1:.*]]: tensor<2x3x3x5xf32>, %[[ARG2:.*]]: tensor<1xf32>) -> tensor<2x8x8x2xf32> { +// CHECK:%[[S0:.*]] = tensor.empty() : tensor<2x8x8x2xf32> +// CHECK-NEXT: %[[S1:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]]], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%[[ARG2]] : tensor<1xf32>) outs(%[[S0]] : tensor<2x8x8x2xf32>) { +// CHECK-NEXT: ^bb0(%[[IN:.*]]: f32, %[[OUT:.*]]: f32): +// CHECK-NEXT: linalg.yield %[[IN]] : f32 +// CHECK-NEXT: } -> tensor<2x8x8x2xf32> +// CHECK-NEXT: %[[S2:.*]] = tensor.empty() : tensor<2x2x6x6x5x2xf32> +// CHECK-NEXT: %[[S3:.*]] = linalg.winograd_filter_transform m(4) r(3) ins(%[[ARG1]] : tensor<2x3x3x5xf32>) outs(%[[S2]] : tensor<2x2x6x6x5x2xf32>) -> tensor<2x2x6x6x5x2xf32> +// CHECK-NEXT: %[[S4:.*]] = tensor.empty() : tensor<2x2x6x6x2x5xf32> +// CHECK-NEXT: %[[S5:.*]] = linalg.winograd_input_transform m(4) r(3) ins(%[[ARG0]] : tensor<2x10x10x5xf32>) outs(%[[S4]] : tensor<2x2x6x6x2x5xf32>) -> tensor<2x2x6x6x2x5xf32> +// CHECK-NEXT: %[[COLLAPSED:.*]] = tensor.collapse_shape %[[S3]] {{\[}}[0, 1, 2, 3], [4], [5]] : tensor<2x2x6x6x5x2xf32> into tensor<144x5x2xf32> +// CHECK-NEXT: %[[COLLAPSED_0:.*]] = tensor.collapse_shape %[[S5]] {{\[}}[0, 1, 2, 3], [4], [5]] : tensor<2x2x6x6x2x5xf32> into tensor<144x2x5xf32> +// CHECK-NEXT: %[[S6:.*]] = tensor.empty() : tensor<144x2x2xf32> +// CHECK-NEXT: %[[S7:.*]] = linalg.batch_matmul ins(%[[COLLAPSED_0]], %[[COLLAPSED]] : tensor<144x2x5xf32>, tensor<144x5x2xf32>) outs(%[[S6]] : tensor<144x2x2xf32>) -> tensor<144x2x2xf32> +// CHECK-NEXT: %[[EXPANDED:.*]] = tensor.expand_shape %[[S7]] {{\[}}[0, 1, 2, 3], [4], [5]] output_shape [2, 2, 6, 6, 2, 2] : tensor<144x2x2xf32> into tensor<2x2x6x6x2x2xf32> +// CHECK-NEXT: %[[S8:.*]] = linalg.winograd_output_transform m(4) r(3) ins(%[[EXPANDED]] : tensor<2x2x6x6x2x2xf32>) outs(%[[S1]] : tensor<2x8x8x2xf32>) -> tensor<2x8x8x2xf32> +// CHECK-NEXT: return %[[S8]] : tensor<2x8x8x2xf32> +// CHECK-NEXT: } + +// - + +func.func @conv2d_unaligned(%arg0: tensor<2x11x11x5xf32>, %arg1: tensor<2x3x3x5xf32>, %arg2: tensor<1xf32>) -> tensor<2x9x9x2xf32> { + %0 = tensor.empty() : tensor<2x9x9x2xf32> + %1 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (0)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2 : tensor<1xf32>) outs(%0 : tensor<2x9x9x2xf32>) { + ^bb0(%in: f32, %out: f32): +linalg.yield %in : f32 + } -> tensor<2x9x9x2xf32> + %2 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%arg0, %arg1 : tensor<2x11x11x5xf32>, tensor<2x3x3x5xf32>) outs(%1 : tensor<2x9x9x2xf32>) -> tensor<2x9x9x2xf32> + return %2 : tensor<2x9x9x2xf32> +} + +module attributes {transform.with_named_sequence} { + transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { +%0 = transform.structured.match ops{["linalg.conv_2d_nhwc_fhwc"]} in %arg1 : (!transform.any_op) -> !transform.any_op +%1 = transform.structured.winograd_conv2d %0 { m = 4, r = 3 } : (!transform.any_op) -> (!transform.any_op) +transform.yield + } +} + +// CHECK: #[[$MAP0:.+]] = affine_map<(d0, d1, d2, d3) -> (0)> +// CHECK: #[[$MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> +// CHECK-LABEL: func.func @conv2d_unaligned
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96182)
@@ -0,0 +1,88 @@ +// RUN: mlir-opt %s -transform-interpreter -canonicalize --split-input-file | FileCheck %s + +func.func @conv2d(%arg0: tensor<2x10x10x5xf32>, %arg1: tensor<2x3x3x5xf32>, %arg2: tensor<1xf32>) -> tensor<2x8x8x2xf32> { + %0 = tensor.empty() : tensor<2x8x8x2xf32> + %1 = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (0)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%arg2 : tensor<1xf32>) outs(%0 : tensor<2x8x8x2xf32>) { + ^bb0(%in: f32, %out: f32): +linalg.yield %in : f32 + } -> tensor<2x8x8x2xf32> + %2 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%arg0, %arg1 : tensor<2x10x10x5xf32>, tensor<2x3x3x5xf32>) outs(%1 : tensor<2x8x8x2xf32>) -> tensor<2x8x8x2xf32> + return %2 : tensor<2x8x8x2xf32> +} + +module attributes {transform.with_named_sequence} { + transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { +%0 = transform.structured.match ops{["linalg.conv_2d_nhwc_fhwc"]} in %arg1 : (!transform.any_op) -> !transform.any_op +%1 = transform.structured.winograd_conv2d %0 { m = 4, r = 3 } : (!transform.any_op) -> (!transform.any_op) +transform.yield + } +} + +// CHECK: #[[$MAP0:.+]] = affine_map<(d0, d1, d2, d3) -> (0)> +// CHECK: #[[$MAP1:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> +// CHECK-LABEL: func.func @conv2d +// CHECK-SAME: (%[[ARG0:.*]]: tensor<2x10x10x5xf32>, %[[ARG1:.*]]: tensor<2x3x3x5xf32>, %[[ARG2:.*]]: tensor<1xf32>) -> tensor<2x8x8x2xf32> { +// CHECK:%[[S0:.*]] = tensor.empty() : tensor<2x8x8x2xf32> +// CHECK-NEXT: %[[S1:.*]] = linalg.generic {indexing_maps = [#[[$MAP0]], #[[$MAP1]]], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%[[ARG2]] : tensor<1xf32>) outs(%[[S0]] : tensor<2x8x8x2xf32>) { +// CHECK-NEXT: ^bb0(%[[IN:.*]]: f32, %[[OUT:.*]]: f32): +// CHECK-NEXT: linalg.yield %[[IN]] : f32 +// CHECK-NEXT: } -> tensor<2x8x8x2xf32> +// CHECK-NEXT: %[[S2:.*]] = tensor.empty() : tensor<2x2x6x6x5x2xf32> +// CHECK-NEXT: %[[S3:.*]] = linalg.winograd_filter_transform m(4) r(3) ins(%[[ARG1]] : tensor<2x3x3x5xf32>) outs(%[[S2]] : tensor<2x2x6x6x5x2xf32>) -> tensor<2x2x6x6x5x2xf32> +// CHECK-NEXT: %[[S4:.*]] = tensor.empty() : tensor<2x2x6x6x2x5xf32> +// CHECK-NEXT: %[[S5:.*]] = linalg.winograd_input_transform m(4) r(3) ins(%[[ARG0]] : tensor<2x10x10x5xf32>) outs(%[[S4]] : tensor<2x2x6x6x2x5xf32>) -> tensor<2x2x6x6x2x5xf32> +// CHECK-NEXT: %[[COLLAPSED:.*]] = tensor.collapse_shape %[[S3]] {{\[}}[0, 1, 2, 3], [4], [5]] : tensor<2x2x6x6x5x2xf32> into tensor<144x5x2xf32> +// CHECK-NEXT: %[[COLLAPSED_0:.*]] = tensor.collapse_shape %[[S5]] {{\[}}[0, 1, 2, 3], [4], [5]] : tensor<2x2x6x6x2x5xf32> into tensor<144x2x5xf32> +// CHECK-NEXT: %[[S6:.*]] = tensor.empty() : tensor<144x2x2xf32> +// CHECK-NEXT: %[[S7:.*]] = linalg.batch_matmul ins(%[[COLLAPSED_0]], %[[COLLAPSED]] : tensor<144x2x5xf32>, tensor<144x5x2xf32>) outs(%[[S6]] : tensor<144x2x2xf32>) -> tensor<144x2x2xf32> +// CHECK-NEXT: %[[EXPANDED:.*]] = tensor.expand_shape %[[S7]] {{\[}}[0, 1, 2, 3], [4], [5]] output_shape [2, 2, 6, 6, 2, 2] : tensor<144x2x2xf32> into tensor<2x2x6x6x2x2xf32> +// CHECK-NEXT: %[[S8:.*]] = linalg.winograd_output_transform m(4) r(3) ins(%[[EXPANDED]] : tensor<2x2x6x6x2x2xf32>) outs(%[[S1]] : tensor<2x8x8x2xf32>) -> tensor<2x8x8x2xf32> +// CHECK-NEXT: return %[[S8]] : tensor<2x8x8x2xf32> +// CHECK-NEXT: } Hsiangkai wrote: Done. https://github.com/llvm/llvm-project/pull/96182 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96182)
@@ -3480,6 +3480,31 @@ DiagnosedSilenceableFailure transform::MapCopyToThreadsOp::applyToOne( return DiagnosedSilenceableFailure::success(); } +//===--===// +// WinogradConv2DOp +//===--===// + +DiagnosedSilenceableFailure transform::WinogradConv2DOp::applyToOne( +transform::TransformRewriter &rewriter, linalg::LinalgOp target, +transform::ApplyToEachResultList &results, +transform::TransformState &state) { + rewriter.setInsertionPoint(target); + auto maybeTransformed = + TypeSwitch>(target) + .Case([&](linalg::Conv2DNhwcFhwcOp op) { +return winogradConv2D(rewriter, op, getM(), getR()); + }) + .Default([&](Operation *op) { +return rewriter.notifyMatchFailure(op, "not supported"); Hsiangkai wrote: Use `emitError` to output error messages. https://github.com/llvm/llvm-project/pull/96182 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96182)
@@ -2587,4 +2587,55 @@ def MapCopyToThreadsOp : }]; } +//===--===// +// Winograd Conv2D +//===--===// + +def WinogradConv2DOp : Op { + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched +matrix multiply. Before the matrix multiply, it will convert filter and +input into a format suitable for batched matrix multiply. After the matrix +multiply, it will convert output to the final result tensor. + +The algorithm F(m x m, r x r) is + +Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A + +The size of output Y is m x m. The size of filter g is r x r. The size of +input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are +transformation matrices. + + Return modes: + +This operation fails if `target` is unsupported. Otherwise, the operation Hsiangkai wrote: Fixed. https://github.com/llvm/llvm-project/pull/96182 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96182)
@@ -2587,4 +2587,55 @@ def MapCopyToThreadsOp : }]; } +//===--===// +// Winograd Conv2D +//===--===// + +def WinogradConv2DOp : Op { + let description = [{ +Winograd Conv2D algorithm will convert linalg Conv2D operator into batched Hsiangkai wrote: Fixed. https://github.com/llvm/llvm-project/pull/96182 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96182)
https://github.com/Hsiangkai edited https://github.com/llvm/llvm-project/pull/96182 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
@@ -48,6 +287,261 @@ Value collapse2DData(RewriterBase &rewriter, Location loc, Value data) { reassociation); } +// This function transforms the filter. The data layout of the filter is FHWC. +// The transformation matrix is 2-dimension. We need to extract H x W from +// FHWC first. We need to generate 2 levels of loops to iterate on F and C. +// After the transformation, we get +// +// scf.for %f = lo_f to hi_f step 1 +// scf.for %c = lo_c to hi_c step 1 +// %extracted = extract filter from filter +// %ret = linalg.matmul G, %extracted +// %ret = linalg.matmul %ret, GT +// %inserted = insert %ret into filter +// +Value filterTransform(RewriterBase &rewriter, Location loc, Value filter, + Value retValue, int64_t m, int64_t r, + bool leftTransform = true, bool rightTransform = true) { + // Map from (m, r) to G transform matrix. + static const llvm::SmallDenseMap + GMatrices = { + {F_2_3, TransformMatrix(G_2x2_3x3, 4, 3)}, + {F_4_3, TransformMatrix(G_4x4_3x3, 6, 3)}, + {F_2_5, TransformMatrix(G_2x2_5x5, 6, 5)}, + }; + + // Map from (m, r) to GT transform matrix. + static const llvm::SmallDenseMap + GTMatrices = { + {F_2_3, TransformMatrix(GT_2x2_3x3, 3, 4)}, + {F_4_3, TransformMatrix(GT_4x4_3x3, 3, 6)}, + {F_2_5, TransformMatrix(GT_2x2_5x5, 5, 6)}, + }; + + auto filterType = cast(filter.getType()); + Type elementType = filterType.getElementType(); + auto filterShape = filterType.getShape(); // F, H, W, C + int64_t filterF = filterShape[0]; + int64_t filterH = filterShape[1]; + int64_t filterW = filterShape[2]; + int64_t filterC = filterShape[3]; + + if (filterH != r && filterH != 1) +return Value(); + if (filterW != r && filterW != 1) +return Value(); + + // Return shape is + auto zeroIdx = rewriter.create(loc, 0); + auto fUpperBound = rewriter.create(loc, filterF); + auto cUpperBound = rewriter.create(loc, filterC); + auto oneStep = rewriter.create(loc, 1); + auto outerForOp = + rewriter.create(loc, zeroIdx, fUpperBound, oneStep, retValue); + Block *outerForBody = outerForOp.getBody(); + rewriter.setInsertionPointToStart(outerForBody); + Value FIter = outerForBody->getArgument(0); + + auto innerForOp = rewriter.create( + loc, zeroIdx, cUpperBound, oneStep, outerForOp.getRegionIterArgs()[0]); + Block *innerForBody = innerForOp.getBody(); + rewriter.setInsertionPointToStart(innerForBody); + Value CIter = innerForBody->getArgument(0); + + // Extract (H, W) from (F, H, W, C) + auto extractFilter = extract2DData( + rewriter, loc, filter, FIter, CIter, /*outLoopIdx=*/0, + /*inLoopIdx=*/3, /*heightIdx=*/1, /*widthIdx=*/2, /*srcSize=*/4); + + TransformMapKeyTy key = {m, r}; + int64_t retRows = 1; + Value matmulRetValue = extractFilter; + if (leftTransform) { +// Get constant transform matrix G +auto it = GMatrices.find(key); +if (it == GMatrices.end()) + return Value(); +const TransformMatrix &GMatrix = it->second; + +retRows = GMatrix.rows; +auto matmulType = RankedTensorType::get({retRows, filterW}, elementType); +auto init = rewriter.create(loc, matmulType.getShape(), + elementType); + +Value G = create2DTransformMatrix(rewriter, loc, GMatrix, elementType); Hsiangkai wrote: There is a `ConstantOpInterface` that can convert `arith.constant` to `memref.get_global` after bufferization. https://github.com/llvm/llvm-project/pull/96183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
@@ -23,6 +26,156 @@ namespace linalg { namespace { +// clang-format off +// Winograd Conv2D uses a minimal 2D filtering algorithm to calculate its +// result. The formula of minimal 2D filtering algorithm F(m x m, r x r), +// m is the output dimension and r is the filter dimension, is +// +// Y = A^T x [ (G x g x G^T) x (B^T x d x B) ] x A +// +// g is filter and d is input data. We need to prepare 6 constant +// transformation matrices, G, G^T, B^T, B, A^T, and A for this formula. +// +// The following tables define these constant transformation matrices for +// F(2 x 2, 3 x 3), F(4 x 4, 3 x 3), and F(2 x 2, 5 x 5) +constexpr float G_2x2_3x3[] = { + -1, 0, 0, + 1./2, -1./2, 1./2, + 1./2, 1./2, 1./2, +0, 0,1 +}; + +constexpr float GT_2x2_3x3[] = { + -1, 1./2, 1./2, 0, +0, -1./2, 1./2, 0, +0, 1./2, 1./2, 1 +}; Hsiangkai wrote: Can you elaborate it a bit more? I am not sure what the idea is here. Thank you. https://github.com/llvm/llvm-project/pull/96183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
@@ -36,6 +189,92 @@ constexpr TransformMapKeyTy F_2_3{2, 3}; constexpr TransformMapKeyTy F_4_3{4, 3}; constexpr TransformMapKeyTy F_2_5{2, 5}; +struct TransformMatrix { Hsiangkai wrote: Done. https://github.com/llvm/llvm-project/pull/96183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
@@ -36,6 +189,92 @@ constexpr TransformMapKeyTy F_2_3{2, 3}; constexpr TransformMapKeyTy F_4_3{4, 3}; constexpr TransformMapKeyTy F_2_5{2, 5}; +struct TransformMatrix { + TransformMatrix(const float *table, int64_t rows, int64_t cols, + int64_t scalarFactor = 1) + : table(table), rows(rows), cols(cols), scalarFactor(scalarFactor) {} + + const float *table; + int64_t rows; + int64_t cols; + int64_t scalarFactor; +}; + +Value create2DTransformMatrix(RewriterBase &rewriter, Location loc, + TransformMatrix transform, Type type) { + ArrayRef const_vec(transform.table, transform.rows * transform.cols); Hsiangkai wrote: Fixed. https://github.com/llvm/llvm-project/pull/96183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
@@ -48,6 +287,261 @@ Value collapse2DData(RewriterBase &rewriter, Location loc, Value data) { reassociation); } +// This function transforms the filter. The data layout of the filter is FHWC. +// The transformation matrix is 2-dimension. We need to extract H x W from +// FHWC first. We need to generate 2 levels of loops to iterate on F and C. +// After the transformation, we get +// +// scf.for %f = lo_f to hi_f step 1 +// scf.for %c = lo_c to hi_c step 1 +// %extracted = extract filter from filter +// %ret = linalg.matmul G, %extracted +// %ret = linalg.matmul %ret, GT +// %inserted = insert %ret into filter +// Hsiangkai wrote: Fixed. https://github.com/llvm/llvm-project/pull/96183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
@@ -48,6 +287,261 @@ Value collapse2DData(RewriterBase &rewriter, Location loc, Value data) { reassociation); } +// This function transforms the filter. The data layout of the filter is FHWC. +// The transformation matrix is 2-dimension. We need to extract H x W from +// FHWC first. We need to generate 2 levels of loops to iterate on F and C. +// After the transformation, we get +// +// scf.for %f = lo_f to hi_f step 1 +// scf.for %c = lo_c to hi_c step 1 +// %extracted = extract filter from filter +// %ret = linalg.matmul G, %extracted +// %ret = linalg.matmul %ret, GT +// %inserted = insert %ret into filter +// +Value filterTransform(RewriterBase &rewriter, Location loc, Value filter, + Value retValue, int64_t m, int64_t r, + bool leftTransform = true, bool rightTransform = true) { + // Map from (m, r) to G transform matrix. + static const llvm::SmallDenseMap + GMatrices = { + {F_2_3, TransformMatrix(G_2x2_3x3, 4, 3)}, + {F_4_3, TransformMatrix(G_4x4_3x3, 6, 3)}, + {F_2_5, TransformMatrix(G_2x2_5x5, 6, 5)}, + }; + + // Map from (m, r) to GT transform matrix. + static const llvm::SmallDenseMap + GTMatrices = { + {F_2_3, TransformMatrix(GT_2x2_3x3, 3, 4)}, + {F_4_3, TransformMatrix(GT_4x4_3x3, 3, 6)}, + {F_2_5, TransformMatrix(GT_2x2_5x5, 5, 6)}, + }; + + auto filterType = cast(filter.getType()); + Type elementType = filterType.getElementType(); + auto filterShape = filterType.getShape(); // F, H, W, C + int64_t filterF = filterShape[0]; + int64_t filterH = filterShape[1]; + int64_t filterW = filterShape[2]; + int64_t filterC = filterShape[3]; + + if (filterH != r && filterH != 1) +return Value(); + if (filterW != r && filterW != 1) +return Value(); + + // Return shape is + auto zeroIdx = rewriter.create(loc, 0); + auto fUpperBound = rewriter.create(loc, filterF); + auto cUpperBound = rewriter.create(loc, filterC); + auto oneStep = rewriter.create(loc, 1); + auto outerForOp = + rewriter.create(loc, zeroIdx, fUpperBound, oneStep, retValue); + Block *outerForBody = outerForOp.getBody(); + rewriter.setInsertionPointToStart(outerForBody); + Value FIter = outerForBody->getArgument(0); Hsiangkai wrote: I use buildLoopNest to create loops and use a callback to construct inner most loop body. https://github.com/llvm/llvm-project/pull/96183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
@@ -100,6 +594,161 @@ Value matrixMultiply(RewriterBase &rewriter, Location loc, return expandOutput; } +// This function transforms the output. The data layout of the output is HWNF. +// The transformation matrix is 2-dimension. We need to extract H x W from +// HWNF first. We need to generate 2 levels of loops to iterate on N and F. +// After the transformation, we get +// +// scf.for %n = lo_n to hi_n step 1 +// scf.for %f = lo_f to hi_f step 1 +// %extracted = extract input from result +// %ret = linalg.matmul AT, %extracted +// %ret = linalg.matmul %ret, A +// %inserted = insert %ret into ret +// +Value outputTransform(RewriterBase &rewriter, Location loc, Value value, + Value output, int64_t m, int64_t r, + bool leftTransform = true, bool rightTransform = true) { + // Map from (m, r) to AT transform matrix. + static const llvm::SmallDenseMap + ATMatrices = { + {F_2_3, TransformMatrix(AT_2x2_3x3, 2, 4)}, + {F_4_3, TransformMatrix(AT_4x4_3x3, 4, 6, 32)}, + {F_2_5, TransformMatrix(AT_2x2_5x5, 2, 6, 16)}, + }; + + // Map from (m, r) to A transform matrix. + static const llvm::SmallDenseMap + AMatrices = { + {F_2_3, TransformMatrix(A_2x2_3x3, 4, 2)}, + {F_4_3, TransformMatrix(A_4x4_3x3, 6, 4, 32)}, + {F_2_5, TransformMatrix(A_2x2_5x5, 6, 2, 16)}, + }; + + auto valueType = cast(value.getType()); + Type elementType = valueType.getElementType(); + auto valueShape = valueType.getShape(); // TileH, TileW, H, W, N, F + int64_t valueH = valueShape[2]; + int64_t valueW = valueShape[3]; + int64_t valueN = valueShape[4]; + int64_t valueF = valueShape[5]; + int64_t alphaH = leftTransform ? m + r - 1 : 1; + int64_t alphaW = rightTransform ? m + r - 1 : 1; + + if (valueH != alphaH && valueH != 1) +return Value(); + if (valueW != alphaW && valueW != 1) +return Value(); + + auto zeroIdx = rewriter.create(loc, 0); + auto nUpperBound = rewriter.create(loc, valueN); + auto fUpperBound = rewriter.create(loc, valueF); + auto oneStep = rewriter.create(loc, 1); + + auto outerForOp = + rewriter.create(loc, zeroIdx, nUpperBound, oneStep, output); + Block *outerForBody = outerForOp.getBody(); + rewriter.setInsertionPointToStart(outerForBody); + Value NIter = outerForBody->getArgument(0); + + auto innerForOp = rewriter.create( + loc, zeroIdx, fUpperBound, oneStep, outerForOp.getRegionIterArgs()[0]); + Block *innerForBody = innerForOp.getBody(); + rewriter.setInsertionPointToStart(innerForBody); + Value FIter = innerForBody->getArgument(0); Hsiangkai wrote: Done. https://github.com/llvm/llvm-project/pull/96183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
@@ -289,6 +938,123 @@ FailureOr winogradConv2DHelper(RewriterBase &rewriter, return transformedOutput.getDefiningOp(); } +FailureOr +decomposeWinogradFilterTransformHelper(RewriterBase &rewriter, + linalg::WinogradFilterTransformOp op) { + Location loc = op.getLoc(); + Value filter = op.getFilter(); + auto filterType = cast(filter.getType()); + auto filterShape = filterType.getShape(); + int64_t filterH = filterShape[1]; + int64_t filterW = filterShape[2]; + + // For F(m x 1, r x 1), we only need to do left side transform. + bool leftTransform = filterH != 1; + // For F(1 x m, 1 x r), we only need to do right side transform. + bool rightTransform = filterW != 1; + Value transformedFilter = + filterTransform(rewriter, loc, filter, op.getOutput(), op.getM(), + op.getR(), leftTransform, rightTransform); + if (!transformedFilter) +return failure(); + + rewriter.replaceOp(op, transformedFilter); + + return transformedFilter.getDefiningOp(); +} + +FailureOr +decomposeWinogradInputTransformHelper(RewriterBase &rewriter, + linalg::WinogradInputTransformOp op) { + Location loc = op.getLoc(); + Value input = op.getInput(); + auto inputType = cast(input.getType()); + auto inputShape = inputType.getShape(); + int64_t inputH = inputShape[1]; + int64_t inputW = inputShape[2]; + + // For F(m x 1, r x 1), we only need to do left side transform. + bool leftTransform = inputH != 1; + // For F(1 x m, 1 x r), we only need to do right side transform. + bool rightTransform = inputW != 1; + Value transformedInput = + inputTransform(rewriter, loc, op.getInput(), op.getOutput(), op.getM(), + op.getR(), leftTransform, rightTransform); + if (!transformedInput) +return failure(); + + rewriter.replaceOp(op, transformedInput); + + return transformedInput.getDefiningOp(); +} + +FailureOr +decomposeWinogradOutputTransformHelper(RewriterBase &rewriter, + linalg::WinogradOutputTransformOp op) { + Location loc = op.getLoc(); + Value value = op.getValue(); + auto valueType = cast(value.getType()); + auto valueShape = valueType.getShape(); + int64_t valueH = valueShape[2]; + int64_t valueW = valueShape[3]; + + // For F(m x 1, r x 1), we only need to do left side transform. + bool leftTransform = valueH != 1; + // For F(1 x m, 1 x r), we only need to do right side transform. + bool rightTransform = valueW != 1; + Value transformedOutput = + outputTransform(rewriter, loc, value, op.getOutput(), op.getM(), + op.getR(), leftTransform, rightTransform); + if (!transformedOutput) +return failure(); + + rewriter.replaceOp(op, transformedOutput); + + return transformedOutput.getDefiningOp(); +} + +class DecomposeWinogradFilterTransform final +: public OpRewritePattern { +public: + using OpRewritePattern::OpRewritePattern; + + LogicalResult matchAndRewrite(linalg::WinogradFilterTransformOp op, +PatternRewriter &rewriter) const override { +if (failed(decomposeWinogradFilterTransformHelper(rewriter, op))) + return failure(); + +return success(); + } +}; + +class DecomposeWinogradInputTransform final +: public OpRewritePattern { +public: + using OpRewritePattern::OpRewritePattern; + + LogicalResult matchAndRewrite(linalg::WinogradInputTransformOp op, +PatternRewriter &rewriter) const override { +if (failed(decomposeWinogradInputTransformHelper(rewriter, op))) + return failure(); + +return success(); Hsiangkai wrote: Done. https://github.com/llvm/llvm-project/pull/96183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Decompose winograd operators (PR #96183)
@@ -323,5 +1089,12 @@ void populateWinogradConv2DPatterns(RewritePatternSet &patterns, int64_t m, patterns.insert(context, m, r); } +void populateDecomposeWinogradOpsPatterns(RewritePatternSet &patterns) { + MLIRContext *context = patterns.getContext(); + patterns.insert(context); + patterns.insert(context); + patterns.insert(context); Hsiangkai wrote: Done. https://github.com/llvm/llvm-project/pull/96183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Implement TilingInterface for winograd operators (PR #96184)
@@ -2760,6 +2760,89 @@ LogicalResult WinogradFilterTransformOp::verify() { return success(); } +SmallVector +WinogradFilterTransformOp::getIterationDomain(OpBuilder &builder) { + Location loc = getLoc(); + Value zero = builder.create(loc, 0); + Value one = builder.create(loc, 1); + Value output = getOutput(); + SmallVector loopBounds(6); + for (unsigned dim = 0; dim < 6; ++dim) { +loopBounds[dim].offset = zero; +loopBounds[dim].size = getDimValue(builder, loc, output, dim); +loopBounds[dim].stride = one; + } + return loopBounds; +} + +SmallVector +WinogradFilterTransformOp::getLoopIteratorTypes() { + SmallVector iteratorTypes(6, + utils::IteratorType::parallel); + return iteratorTypes; +} + +Value getValueFromOpFoldResult(OpFoldResult opFoldResult, OpBuilder &builder, + Location loc) { + if (auto val = opFoldResult.dyn_cast()) { +return val; + } else if (auto attr = opFoldResult.dyn_cast()) { +auto intAttr = cast(attr); +return builder.create(loc, intAttr); + } Hsiangkai wrote: I only find a similar one in `mlir/lib/Dialect/Vector/IR/VectorOps.cpp` under `vector` namespace. It is to convert an array of `OpFoldResult` to an array of `Value`. https://github.com/llvm/llvm-project/pull/96184 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Implement TilingInterface for winograd operators (PR #96184)
@@ -2638,4 +2638,41 @@ def WinogradConv2DOp : Op { + let description = [{ +Decompose winograd operators. It will convert filter, input and output +transform operators into a combination of scf, tensor, and linalg Hsiangkai wrote: Done. https://github.com/llvm/llvm-project/pull/96184 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Implement TilingInterface for winograd operators (PR #96184)
@@ -2760,6 +2760,89 @@ LogicalResult WinogradFilterTransformOp::verify() { return success(); } +SmallVector +WinogradFilterTransformOp::getIterationDomain(OpBuilder &builder) { + Location loc = getLoc(); + Value zero = builder.create(loc, 0); + Value one = builder.create(loc, 1); Hsiangkai wrote: Done. https://github.com/llvm/llvm-project/pull/96184 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Implement TilingInterface for winograd operators (PR #96184)
@@ -2760,6 +2760,89 @@ LogicalResult WinogradFilterTransformOp::verify() { return success(); } +SmallVector +WinogradFilterTransformOp::getIterationDomain(OpBuilder &builder) { + Location loc = getLoc(); + Value zero = builder.create(loc, 0); + Value one = builder.create(loc, 1); + Value output = getOutput(); + SmallVector loopBounds(6); + for (unsigned dim = 0; dim < 6; ++dim) { +loopBounds[dim].offset = zero; +loopBounds[dim].size = getDimValue(builder, loc, output, dim); +loopBounds[dim].stride = one; + } + return loopBounds; +} + +SmallVector +WinogradFilterTransformOp::getLoopIteratorTypes() { + SmallVector iteratorTypes(6, + utils::IteratorType::parallel); + return iteratorTypes; +} + +Value getValueFromOpFoldResult(OpFoldResult opFoldResult, OpBuilder &builder, + Location loc) { + if (auto val = opFoldResult.dyn_cast()) { +return val; + } else if (auto attr = opFoldResult.dyn_cast()) { +auto intAttr = cast(attr); +return builder.create(loc, intAttr); + } + // This should never happen if OpFoldResult is correctly formed. Hsiangkai wrote: I updated the implementation. https://github.com/llvm/llvm-project/pull/96184 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [mlir][linalg] Add transform operator for Winograd Conv2D algorithm (PR #96182)
@@ -3480,6 +3480,31 @@ DiagnosedSilenceableFailure transform::MapCopyToThreadsOp::applyToOne( return DiagnosedSilenceableFailure::success(); } +//===--===// +// WinogradConv2DOp +//===--===// + +DiagnosedSilenceableFailure transform::WinogradConv2DOp::applyToOne( +transform::TransformRewriter &rewriter, linalg::LinalgOp target, +transform::ApplyToEachResultList &results, +transform::TransformState &state) { + rewriter.setInsertionPoint(target); + auto maybeTransformed = + TypeSwitch>(target) + .Case([&](linalg::Conv2DNhwcFhwcOp op) { +return winogradConv2D(rewriter, op, getM(), getR()); + }) + .Default([&](Operation *op) { +return rewriter.notifyMatchFailure(op, "not supported"); + }); + + if (failed(maybeTransformed)) +return emitDefaultSilenceableFailure(target); Hsiangkai wrote: I have avoided a default message here. https://github.com/llvm/llvm-project/pull/96182 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] c28e32d - [RISCV] Lazily add RVV C intrinsics.
Author: Hsiangkai Wang Date: 2021-10-10T20:22:30+08:00 New Revision: c28e32d20fd4a334e49f64a041cbefe7cd1e8ce3 URL: https://github.com/llvm/llvm-project/commit/c28e32d20fd4a334e49f64a041cbefe7cd1e8ce3 DIFF: https://github.com/llvm/llvm-project/commit/c28e32d20fd4a334e49f64a041cbefe7cd1e8ce3.diff LOG: [RISCV] Lazily add RVV C intrinsics. Leverage the method OpenCL uses that adds C intrinsics when the lookup failed. There is no need to define C intrinsics in the header file any more. It could help to avoid the large header file to speed up the compilation of RVV source code. Besides that, only the C intrinsics used by the users will be added into the declaration table. Added: Modified: clang/include/clang/Basic/CMakeLists.txt clang/lib/Sema/SemaLookup.cpp clang/utils/TableGen/RISCVVEmitter.cpp clang/utils/TableGen/TableGen.cpp clang/utils/TableGen/TableGenBackends.h llvm/docs/CommandGuide/tblgen.rst Removed: diff --git a/clang/include/clang/Basic/CMakeLists.txt b/clang/include/clang/Basic/CMakeLists.txt index 8cd891385a483..b930842ae8cfd 100644 --- a/clang/include/clang/Basic/CMakeLists.txt +++ b/clang/include/clang/Basic/CMakeLists.txt @@ -90,3 +90,6 @@ clang_tablegen(riscv_vector_builtins.inc -gen-riscv-vector-builtins clang_tablegen(riscv_vector_builtin_cg.inc -gen-riscv-vector-builtin-codegen SOURCE riscv_vector.td TARGET ClangRISCVVectorBuiltinCG) +clang_tablegen(riscv_vector_builtin_sema.inc -gen-riscv-vector-builtin-sema + SOURCE riscv_vector.td + TARGET ClangRISCVVectorBuiltinSema) diff --git a/clang/lib/Sema/SemaLookup.cpp b/clang/lib/Sema/SemaLookup.cpp index db6a01543d76a..28515f461545a 100644 --- a/clang/lib/Sema/SemaLookup.cpp +++ b/clang/lib/Sema/SemaLookup.cpp @@ -23,6 +23,8 @@ #include "clang/Basic/Builtins.h" #include "clang/Basic/FileManager.h" #include "clang/Basic/LangOptions.h" +#include "clang/Basic/TargetBuiltins.h" +#include "clang/Basic/TargetInfo.h" #include "clang/Lex/HeaderSearch.h" #include "clang/Lex/ModuleLoader.h" #include "clang/Lex/Preprocessor.h" @@ -48,6 +50,7 @@ #include #include "OpenCLBuiltins.inc" +#include "clang/Basic/riscv_vector_builtin_sema.inc" using namespace clang; using namespace sema; @@ -895,6 +898,68 @@ static void InsertOCLBuiltinDeclarationsFromTable(Sema &S, LookupResult &LR, LR.resolveKind(); } +static bool InsertRVVBuiltinDeclarationsFromTable(Sema &S, LookupResult &LR, + IdentifierInfo *II, + const TargetInfo &TI, + Preprocessor &PP) { + bool HasF = TI.hasFeature("f"); + bool HasD = TI.hasFeature("d"); + bool HasZfh = TI.hasFeature("experimental-zfh"); + bool HasZvamo = TI.hasFeature("experimental-zvamo"); + unsigned Features = 0; + if (HasF) +Features |= RISCVFeature_F; + if (HasD) +Features |= RISCVFeature_D; + if (HasZfh) +Features |= RISCVFeature_ZFH; + if (HasZvamo) +Features |= RISCVFeature_ZVAMO; + + const RVVIntrinsicInfo *Intrinsic = std::find_if( + std::begin(RVVIntrinsicInfos), std::end(RVVIntrinsicInfos), + [II](const RVVIntrinsicInfo &RVVII) { +return std::strcmp(RVVII.TargetName, II->getName().data()) == 0; + }); + if (Intrinsic != std::end(RVVIntrinsicInfos)) { +if ((Intrinsic->RequireFeatures & Features) != Intrinsic->RequireFeatures) + return false; +if (NamedDecl *FD = +S.LazilyCreateBuiltin(II, Intrinsic->TargetBuiltinID, S.TUScope, + LR.isForRedeclaration(), LR.getNameLoc())) { + LR.addDecl(FD); + return true; +} + } + + bool Found = false; + std::for_each( + std::begin(RVVIntrinsicOverloadInfos), + std::end(RVVIntrinsicOverloadInfos), + [&S, &LR, II, &PP, &Found, + Features](const RVVIntrinsicOverloadInfo &RVVII) { +if (std::strcmp(RVVII.OverloadName, II->getName().data()) == 0) { + if ((RVVII.RequireFeatures & Features) != RVVII.RequireFeatures) +return; + if (NamedDecl *FD = S.LazilyCreateBuiltin( + II, RVVII.TargetBuiltinID, S.TUScope, LR.isForRedeclaration(), + LR.getNameLoc())) { +auto &IntrinsicII = PP.getIdentifierTable().get(RVVII.TargetName); +FD->addAttr(OverloadableAttr::CreateImplicit(S.Context)); +FD->addAttr( +BuiltinAliasAttr::CreateImplicit(S.Context, &IntrinsicII)); +LR.addDecl(FD); +Found = true; + } +} + }); + + if (Found) +LR.resolveKind(); + + return Found; +} + /// Lookup a builtin function, when name lookup would otherwise /// fail. bool Sema::LookupBuiltin(LookupResult &R) { @@ -927,6 +992,12 @@ bool Sema::LookupBuiltin(LookupResult &R) { } } + const TargetI
[llvm-branch-commits] [llvm] 914e2f5 - [NFC] Use generic name for scalable vector stack ID.
Author: Hsiangkai Wang Date: 2021-01-13T10:57:43+08:00 New Revision: 914e2f5a02f4f896eec9a00f536d1118bf1d9961 URL: https://github.com/llvm/llvm-project/commit/914e2f5a02f4f896eec9a00f536d1118bf1d9961 DIFF: https://github.com/llvm/llvm-project/commit/914e2f5a02f4f896eec9a00f536d1118bf1d9961.diff LOG: [NFC] Use generic name for scalable vector stack ID. Differential Revision: https://reviews.llvm.org/D94471 Added: Modified: llvm/include/llvm/CodeGen/MIRYamlMapping.h llvm/include/llvm/CodeGen/TargetFrameLowering.h llvm/lib/Target/AArch64/AArch64FrameLowering.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.h llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64InstrInfo.cpp llvm/lib/Target/AMDGPU/SIFrameLowering.cpp llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir llvm/test/CodeGen/AArch64/framelayout-sve-basepointer.mir llvm/test/CodeGen/AArch64/framelayout-sve-calleesaves-fix.mir llvm/test/CodeGen/AArch64/framelayout-sve-scavengingslot.mir llvm/test/CodeGen/AArch64/framelayout-sve.mir llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir llvm/test/CodeGen/AArch64/spillfill-sve.mir llvm/test/CodeGen/AArch64/sve-alloca-stackid.ll llvm/test/CodeGen/AArch64/sve-calling-convention-byref.ll llvm/test/CodeGen/AArch64/sve-localstackalloc.mir Removed: diff --git a/llvm/include/llvm/CodeGen/MIRYamlMapping.h b/llvm/include/llvm/CodeGen/MIRYamlMapping.h index f7006517e3df..4a7406473b11 100644 --- a/llvm/include/llvm/CodeGen/MIRYamlMapping.h +++ b/llvm/include/llvm/CodeGen/MIRYamlMapping.h @@ -347,7 +347,7 @@ struct ScalarEnumerationTraits { static void enumeration(yaml::IO &IO, TargetStackID::Value &ID) { IO.enumCase(ID, "default", TargetStackID::Default); IO.enumCase(ID, "sgpr-spill", TargetStackID::SGPRSpill); -IO.enumCase(ID, "sve-vec", TargetStackID::SVEVector); +IO.enumCase(ID, "scalable-vector", TargetStackID::ScalableVector); IO.enumCase(ID, "noalloc", TargetStackID::NoAlloc); } }; diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h b/llvm/include/llvm/CodeGen/TargetFrameLowering.h index c6806793f248..792452f6e81d 100644 --- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h +++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h @@ -27,7 +27,7 @@ namespace TargetStackID { enum Value { Default = 0, SGPRSpill = 1, -SVEVector = 2, +ScalableVector = 2, NoAlloc = 255 }; } diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp index 1687fd0116a5..65ee5016042c 100644 --- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -249,7 +249,7 @@ static unsigned estimateRSStackSizeLimit(MachineFunction &MF) { TargetStackID::Value AArch64FrameLowering::getStackIDForScalableVectors() const { - return TargetStackID::SVEVector; + return TargetStackID::ScalableVector; } /// Returns the size of the fixed object area (allocated next to sp on entry) @@ -496,7 +496,7 @@ void AArch64FrameLowering::emitCalleeSavedFrameMoves( continue; StackOffset Offset; -if (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::SVEVector) { +if (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector) { AArch64FunctionInfo *AFI = MF.getInfo(); Offset = StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) - @@ -1856,7 +1856,7 @@ StackOffset AArch64FrameLowering::resolveFrameIndexReference( const auto &MFI = MF.getFrameInfo(); int64_t ObjectOffset = MFI.getObjectOffset(FI); bool isFixed = MFI.isFixedObjectIndex(FI); - bool isSVE = MFI.getStackID(FI) == TargetStackID::SVEVector; + bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector; return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg, PreferFP, ForSimm); } @@ -2412,7 +2412,7 @@ bool AArch64FrameLowering::spillCalleeSavedRegisters( // Update the StackIDs of the SVE stack slots. MachineFrameInfo &MFI = MF.getFrameInfo(); if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) - MFI.setStackID(RPI.FrameIdx, TargetStackID::SVEVector); + MFI.setStackID(RPI.FrameIdx, TargetStackID::ScalableVector); } return true; @@ -2761,7 +2761,7 @@ static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, #ifndef NDEBUG // First process all fixed stack objects. for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) -assert(MFI.getStackID(I) != TargetStackID::SVEVector && +assert(MFI.getStackID(I) != TargetStackID::ScalableVector && "SVE vectors should never be passed on the stack by value, only by " "reference."); #en
[llvm-branch-commits] [llvm] 619eb14 - [NFC][RISCV] Remove useless code in RISCVRegisterInfo.td.
Author: Hsiangkai Wang Date: 2021-01-15T20:08:51+08:00 New Revision: 619eb14775990d610236288f414a486d86df47cc URL: https://github.com/llvm/llvm-project/commit/619eb14775990d610236288f414a486d86df47cc DIFF: https://github.com/llvm/llvm-project/commit/619eb14775990d610236288f414a486d86df47cc.diff LOG: [NFC][RISCV] Remove useless code in RISCVRegisterInfo.td. Differential Revision: https://reviews.llvm.org/D94750 Added: Modified: llvm/lib/Target/RISCV/RISCVRegisterInfo.td Removed: diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td index fdac1eeb4fe4..99f74bfc2a09 100644 --- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td +++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td @@ -383,10 +383,6 @@ let RegAltNameIndices = [ABIRegAltName] in { def VXRM : RISCVReg<0, "vxrm", ["vxrm"]>; } -class RegisterTypes reg_types> { - list types = reg_types; -} - class VReg regTypes, dag regList, int Vlmul> : RegisterClass<"RISCV", regTypes, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 098dbf1 - [RISCV] Correct alignment settings for vector registers.
Author: Hsiangkai Wang Date: 2021-01-16T23:21:29+08:00 New Revision: 098dbf190a5586d02f48b84eb41b93b701cdeb97 URL: https://github.com/llvm/llvm-project/commit/098dbf190a5586d02f48b84eb41b93b701cdeb97 DIFF: https://github.com/llvm/llvm-project/commit/098dbf190a5586d02f48b84eb41b93b701cdeb97.diff LOG: [RISCV] Correct alignment settings for vector registers. According to "9. Vector Memory Alignment Constraints" in V specification, the alignment of vector memory access is aligned to the size of the element. In our current implementation, we support ELEN up to 64. We could assume the alignment of vector registers is 64 under the assumption. Differential Revision: https://reviews.llvm.org/D94751 Added: Modified: llvm/lib/Target/RISCV/RISCVRegisterInfo.td Removed: diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td index 99f74bfc2a09..75615fd334b7 100644 --- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td +++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td @@ -386,11 +386,10 @@ let RegAltNameIndices = [ABIRegAltName] in { class VReg regTypes, dag regList, int Vlmul> : RegisterClass<"RISCV", regTypes, - // FIXME: Spill alignment set to 16 bytes. - 128, + 64, // The maximum supported ELEN is 64. regList> { int VLMul = Vlmul; - int Size = !mul(Vlmul, 64); // FIXME: assuming ELEN=64 + int Size = !mul(Vlmul, 64); } def VR : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 9dd5aea - [RISCV] Make LMUL field in VTYPE continuous.
Author: Hsiangkai Wang Date: 2021-01-22T00:47:32+08:00 New Revision: 9dd5aea1e0397f693a739bffb03fd94dc8e1ec79 URL: https://github.com/llvm/llvm-project/commit/9dd5aea1e0397f693a739bffb03fd94dc8e1ec79 DIFF: https://github.com/llvm/llvm-project/commit/9dd5aea1e0397f693a739bffb03fd94dc8e1ec79.diff LOG: [RISCV] Make LMUL field in VTYPE continuous. Upgrade RISC-V V extension to v1.0-08a0b46. Update the VTYPE encoding. Make LMUL encoding in a continuous field. Added: Modified: llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir llvm/test/CodeGen/RISCV/rvv/add-vsetvli-vlmax.ll llvm/test/MC/RISCV/rvv/snippet.s llvm/test/MC/RISCV/rvv/vsetvl.s Removed: diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h index cd81e5a07975..6c9f860c204c 100644 --- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h +++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h @@ -353,17 +353,13 @@ inline static bool isValidLMUL(unsigned LMUL, bool Fractional) { // -++ // 7| vma| Vector mask agnostic // 6| vta| Vector tail agnostic -// 5| vlmul[2] | Fractional lmul? -// 4:2 | vsew[2:0] | Standard element width (SEW) setting -// 1:0 | vlmul[1:0] | Vector register group multiplier (LMUL) setting -// -// TODO: This format will change for the V extensions spec v1.0. +// 5:3 | vsew[2:0] | Standard element width (SEW) setting +// 2:0 | vlmul[2:0] | Vector register group multiplier (LMUL) setting inline static unsigned encodeVTYPE(RISCVVLMUL VLMUL, RISCVVSEW VSEW, bool TailAgnostic, bool MaskAgnostic) { unsigned VLMULBits = static_cast(VLMUL); unsigned VSEWBits = static_cast(VSEW); - unsigned VTypeI = - ((VLMULBits & 0x4) << 3) | (VSEWBits << 2) | (VLMULBits & 0x3); + unsigned VTypeI = (VSEWBits << 3) | (VLMULBits & 0x7); if (TailAgnostic) VTypeI |= 0x40; if (MaskAgnostic) @@ -372,14 +368,13 @@ inline static unsigned encodeVTYPE(RISCVVLMUL VLMUL, RISCVVSEW VSEW, return VTypeI; } -// TODO: This format will change for the V extensions spec v1.0. inline static RISCVVLMUL getVLMUL(unsigned VType) { - unsigned VLMUL = (VType & 0x3) | ((VType & 0x20) >> 3); + unsigned VLMUL = VType & 0x7; return static_cast(VLMUL); } inline static RISCVVSEW getVSEW(unsigned VType) { - unsigned VSEW = (VType >> 2) & 0x7; + unsigned VSEW = (VType >> 3) & 0x7; return static_cast(VSEW); } diff --git a/llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir b/llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir index f323bf1b3161..a93ec2c55cb8 100644 --- a/llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir +++ b/llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir @@ -39,13 +39,13 @@ body: | # POST-INSERTER: %1:gpr = COPY $x12 # POST-INSERTER: %2:gpr = COPY $x11 # POST-INSERTER: %3:gpr = COPY $x10 -# POST-INSERTER: dead %7:gpr = PseudoVSETVLI %0, 76, implicit-def $vl, implicit-def $vtype +# POST-INSERTER: dead %7:gpr = PseudoVSETVLI %0, 88, implicit-def $vl, implicit-def $vtype # POST-INSERTER: %4:vr = PseudoVLE64_V_M1 %2, $noreg, -1, implicit $vl, implicit $vtype :: (load unknown-size from %ir.pa, align 8) -# POST-INSERTER: dead %8:gpr = PseudoVSETVLI %0, 76, implicit-def $vl, implicit-def $vtype +# POST-INSERTER: dead %8:gpr = PseudoVSETVLI %0, 88, implicit-def $vl, implicit-def $vtype # POST-INSERTER: %5:vr = PseudoVLE64_V_M1 %1, $noreg, -1, implicit $vl, implicit $vtype :: (load unknown-size from %ir.pb, align 8) -# POST-INSERTER: dead %9:gpr = PseudoVSETVLI %0, 76, implicit-def $vl, implicit-def $vtype +# POST-INSERTER: dead %9:gpr = PseudoVSETVLI %0, 88, implicit-def $vl, implicit-def $vtype # POST-INSERTER: %6:vr = PseudoVADD_VV_M1 killed %4, killed %5, $noreg, -1, implicit $vl, implicit $vtype -# POST-INSERTER: dead %10:gpr = PseudoVSETVLI %0, 76, implicit-def $vl, implicit-def $vtype +# POST-INSERTER: dead %10:gpr = PseudoVSETVLI %0, 88, implicit-def $vl, implicit-def $vtype # POST-INSERTER: PseudoVSE64_V_M1 killed %6, %3, $noreg, -1, implicit $vl, implicit $vtype :: (store unknown-size into %ir.pc, align 8) # CODEGEN: vsetvli a3, a3, e64,m1,ta,mu diff --git a/llvm/test/CodeGen/RISCV/rvv/add-vsetvli-vlmax.ll b/llvm/test/CodeGen/RISCV/rvv/add-vsetvli-vlmax.ll index 53f316f61e92..eec35b114e79 100644 --- a/llvm/test/CodeGen/RISCV/rvv/add-vsetvli-vlmax.ll +++ b/llvm/test/CodeGen/RISCV/rvv/add-vsetvli-vlmax.ll @@ -25,11 +25,11 @@ define void @vadd_vint64m1( ; PRE-INSERTER: %5:vr = PseudoVADD_VV_M1 killed %3, killed %4, $x0, 64, implicit $vl, implicit $vtype ; PRE-INSERTER: PseudoVSE64_V_M1 killed %5, %0, $x0, 64, implicit $vl, implicit $vtype :: (store unknown-size into %ir.pc, align 8) -; POST-INSERTER: dead %6:gpr =
[llvm-branch-commits] [llvm] 266820b - [RISCV] Add new V instructions in v1.0-08a0b46.
Author: Hsiangkai Wang Date: 2021-01-22T00:59:58+08:00 New Revision: 266820be352d5b824cb01c93df1b00184fcc7803 URL: https://github.com/llvm/llvm-project/commit/266820be352d5b824cb01c93df1b00184fcc7803 DIFF: https://github.com/llvm/llvm-project/commit/266820be352d5b824cb01c93df1b00184fcc7803.diff LOG: [RISCV] Add new V instructions in v1.0-08a0b46. Add new V instructions. vfrsqrte7.v vfrece7.v vrgatherei16.vv vneg.v vncvt.x.x.w vfneg.v Added: llvm/test/MC/RISCV/rvv/aliases.s Modified: llvm/lib/Target/RISCV/RISCVInstrInfoV.td llvm/test/MC/RISCV/rvv/fothers.s llvm/test/MC/RISCV/rvv/others.s Removed: diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoV.td b/llvm/lib/Target/RISCV/RISCVInstrInfoV.td index aa505b22afd8..7d3f5071ee9e 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfoV.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfoV.td @@ -512,6 +512,8 @@ defm VADD_V : VALU_IV_V_X_I<"vadd", 0b00>; defm VSUB_V : VALU_IV_V_X<"vsub", 0b10>; defm VRSUB_V : VALU_IV_X_I<"vrsub", 0b11>; +def : InstAlias<"vneg.v $vd, $vs$vm", (VRSUB_VX VR:$vd, VR:$vs, X0, VMaskOp:$vm)>; + // Vector Widening Integer Add/Subtract // Refer to 11.2 Widening Vector Arithmetic Instructions // The destination vector register group cannot overlap a source vector @@ -584,6 +586,9 @@ defm VNSRL_W : VALU_IV_V_X_I<"vnsrl", 0b101100, uimm5, "w">; defm VNSRA_W : VALU_IV_V_X_I<"vnsra", 0b101101, uimm5, "w">; } // Constraints = "@earlyclobber $vd", RVVConstraint = Narrow +def : InstAlias<"vncvt.x.x.w $vd, $vs$vm", +(VNSRL_WX VR:$vd, VR:$vs, X0, VMaskOp:$vm)>; + // Vector Integer Comparison Instructions let RVVConstraint = NoConstraint in { defm VMSEQ_V : VALU_IV_V_X_I<"vmseq", 0b011000>; @@ -784,6 +789,8 @@ defm VFWNMSAC_V : VALUr_FV_V_F<"vfwnmsac", 0b11>; // Vector Floating-Point Square-Root Instruction defm VFSQRT_V : VALU_FV_VS2<"vfsqrt.v", 0b010011, 0b0>; +defm VFRSQRTE7_V : VALU_FV_VS2<"vfrsqrte7.v", 0b010011, 0b00100>; +defm VFRECE7_V : VALU_FV_VS2<"vfrece7.v", 0b010011, 0b00101>; // Vector Floating-Point MIN/MAX Instructions defm VFMIN_V : VALU_FV_V_F<"vfmin", 0b000100>; @@ -794,6 +801,9 @@ defm VFSGNJ_V : VALU_FV_V_F<"vfsgnj", 0b001000>; defm VFSGNJN_V : VALU_FV_V_F<"vfsgnjn", 0b001001>; defm VFSGNJX_V : VALU_FV_V_F<"vfsgnjx", 0b001010>; +def : InstAlias<"vfneg.v $vd, $vs$vm", +(VFSGNJN_VV VR:$vd, VR:$vs, VR:$vs, VMaskOp:$vm)>; + // Vector Floating-Point Compare Instructions let RVVConstraint = NoConstraint in { defm VMFEQ_V : VALU_FV_V_F<"vmfeq", 0b011000>; @@ -1010,6 +1020,7 @@ let Predicates = [HasStdExtV] in { // Vector Register Gather Instruction let Constraints = "@earlyclobber $vd", RVVConstraint = Vrgather in { defm VRGATHER_V : VALU_IV_V_X_I<"vrgather", 0b001100, uimm5>; +def VRGATHEREI16_VV : VALUVV<0b001110, OPIVV, "vrgatherei16.vv">; } // Constraints = "@earlyclobber $vd", RVVConstraint = Vrgather // Vector Compress Instruction diff --git a/llvm/test/MC/RISCV/rvv/aliases.s b/llvm/test/MC/RISCV/rvv/aliases.s new file mode 100644 index ..7f937dcfcfd9 --- /dev/null +++ b/llvm/test/MC/RISCV/rvv/aliases.s @@ -0,0 +1,65 @@ +# RUN: llvm-mc --triple=riscv64 -mattr +experimental-v < %s --show-encoding 2>&1 \ +# RUN: -mattr +d | FileCheck --check-prefix=ALIAS %s +# RUN: llvm-mc --triple=riscv64 -mattr=+experimental-v --riscv-no-aliases < %s \ +# RUN: -mattr +d --show-encoding 2>&1 | FileCheck --check-prefix=NO-ALIAS %s + +# ALIAS:vwcvt.x.x.v v2, v1, v0.t# encoding: [0x57,0x61,0x10,0xc4] +# NO-ALIAS: vwadd.vxv2, v1, zero, v0.t # encoding: [0x57,0x61,0x10,0xc4] +vwcvt.x.x.v v2, v1, v0.t +# ALIAS:vwcvtu.x.x.vv2, v1, v0.t# encoding: [0x57,0x61,0x10,0xc0] +# NO-ALIAS: vwaddu.vx v2, v1, zero, v0.t # encoding: [0x57,0x61,0x10,0xc0] +vwcvtu.x.x.v v2, v1, v0.t +# ALIAS:vnot.v v2, v2, v0.t# encoding: [0x57,0xb1,0x2f,0x2c] +# NO-ALIAS: vxor.vi v2, v2, -1, v0.t# encoding: [0x57,0xb1,0x2f,0x2c] +vnot.v v2, v2, v0.t +# ALIAS:vmsltu.vv v2, v1, v2, v0.t # encoding: [0x57,0x01,0x11,0x68] +# NO-ALIAS: vmsltu.vv v2, v1, v2, v0.t # encoding: [0x57,0x01,0x11,0x68] +vmsgtu.vv v2, v2, v1, v0.t +# ALIAS:vmslt.vvv2, v1, v2, v0.t # encoding: [0x57,0x01,0x11,0x6c] +# NO-ALIAS: vmslt.vvv2, v1, v2, v0.t # encoding: [0x57,0x01,0x11,0x6c] +vmsgt.vv v2, v2, v1, v0.t +# ALIAS:vmsleu.vv v2, v1, v2, v0.t # encoding: [0x57,0x01,0x11,0x70] +# NO-ALIAS: vmsleu.vv v2, v1, v2, v0.t # encoding: [0x57,0x01,0x11,0x70] +vmsgeu.vv v2, v2, v1, v0.t +# ALIAS:vmsle.vvv2, v1, v2, v0.t # encoding: [0x57,0x01,0x11,0x74] +# NO-ALIAS: vmsle.vvv2, v1, v2, v0.t # encoding: [0x57,0x01,0x11,0x74] +vmsge.vv v2, v2, v1, v0.t +# ALIAS:vmsleu.vi v2, v2, 15, v0.t # encoding: [0x57,0xb1,0x27,0x70] +# NO-ALIAS: vmsleu.vi v2, v2, 15, v0.t #
[llvm-branch-commits] [llvm] b8921af - [RISCV] Update V instructions constraints to conform to v1.0
Author: Hsiangkai Wang Date: 2021-01-22T01:15:55+08:00 New Revision: b8921af63b0d746606f2482f3a020ea7cd316cd2 URL: https://github.com/llvm/llvm-project/commit/b8921af63b0d746606f2482f3a020ea7cd316cd2 DIFF: https://github.com/llvm/llvm-project/commit/b8921af63b0d746606f2482f3a020ea7cd316cd2.diff LOG: [RISCV] Update V instructions constraints to conform to v1.0 Upgrade RISC-V V extension to v1.0-08a0b46. Update instruction constraints to conform to v1.0. Differential Revision: https://reviews.llvm.org/D93612 Added: Modified: llvm/lib/Target/RISCV/RISCVInstrFormats.td llvm/lib/Target/RISCV/RISCVInstrInfoV.td llvm/test/MC/RISCV/rvv/add.s llvm/test/MC/RISCV/rvv/convert.s llvm/test/MC/RISCV/rvv/invalid.s llvm/test/MC/RISCV/rvv/shift.s llvm/test/MC/RISCV/rvv/sub.s Removed: diff --git a/llvm/lib/Target/RISCV/RISCVInstrFormats.td b/llvm/lib/Target/RISCV/RISCVInstrFormats.td index ea867c549e64..7be74b79d99b 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrFormats.td +++ b/llvm/lib/Target/RISCV/RISCVInstrFormats.td @@ -64,22 +64,21 @@ def VMConstraint : RISCVVConstraint<0b100>; // register is being written with a mask value (e.g., comparisons) or the // scalar result of a reduction. // -// * Widening: The destination vector register group cannot overlap a source -// vector register group of a diff erent EEW +// * Widening: The destination EEW is greater than the source EEW, the source +// EMUL is at least 1. The destination vector register group cannot overlap +// with the source vector register groups besides the highest-numbered part of +// the destination register group. // -// * Narrowing: The destination vector register group cannot overlap the -// first source vector register group +// * Narrowing: The destination EEW is smaller than the source EEW. The +// destination vector register group cannot overlap with the source vector +// register groups besides the lowest-numbered part of the source register +// group. // -// * For vadc and vsbc, an illegal instruction exception is raised if the -// destination vector register is v0. +// * vmsbf.m/vmsif.m/vmsof.m: The destination register cannot overlap the +// source register and, if masked, cannot overlap the mask register ('v0'). // -// * For vmadc and vmsbc, an illegal instruction exception is raised if the -// destination vector register overlaps a source vector register group. -// -// * viota: An illegal instruction exception is raised if the destination -// vector register group overlaps the source vector mask register. If the -// instruction is masked, an illegal instruction exception is issued if the -// destination vector register group overlaps v0. +// * viota: The destination register cannot overlap the source register and, +// if masked, cannot overlap the mask register ('v0'). // // * v[f]slide[1]up: The destination vector register group for vslideup cannot // overlap the source vector register group. @@ -96,12 +95,6 @@ def WidenW : RISCVVConstraint; def WidenCvt : RISCVVConstraint; -def Narrow : RISCVVConstraint; -def NarrowCvt: RISCVVConstraint; -def Vmadc: RISCVVConstraint; def Iota : RISCVVConstraint; def SlideUp : RISCVVConstraint; // Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions defm VADC_V : VALUm_IV_V_X_I<"vadc", 0b01>; -let Constraints = "@earlyclobber $vd", RVVConstraint = Vmadc in { +let Constraints = "@earlyclobber $vd", RVVConstraint = NoConstraint in { defm VMADC_V : VALUm_IV_V_X_I<"vmadc", 0b010001>; defm VMADC_V : VALUNoVm_IV_V_X_I<"vmadc", 0b010001>; -} // Constraints = "@earlyclobber $vd", RVVConstraint = Vmadc +} // Constraints = "@earlyclobber $vd", RVVConstraint = NoConstraint defm VSBC_V : VALUm_IV_V_X<"vsbc", 0b010010>; -let Constraints = "@earlyclobber $vd", RVVConstraint = Vmadc in { +let Constraints = "@earlyclobber $vd", RVVConstraint = NoConstraint in { defm VMSBC_V : VALUm_IV_V_X<"vmsbc", 0b010011>; defm VMSBC_V : VALUNoVm_IV_V_X<"vmsbc", 0b010011>; -} // Constraints = "@earlyclobber $vd", RVVConstraint = Vmadc +} // Constraints = "@earlyclobber $vd", RVVConstraint = NoConstraint // Vector Bitwise Logical Instructions defm VAND_V : VALU_IV_V_X_I<"vand", 0b001001>; @@ -581,10 +581,10 @@ defm VSRA_V : VALU_IV_V_X_I<"vsra", 0b101001, uimm5>; // The destination vector register group cannot overlap the first source // vector register group (specified by vs2). The destination vector register // group cannot overlap the mask register if used, unless LMUL=1. -let Constraints = "@earlyclobber $vd", RVVConstraint = Narrow in { +let Constraints = "@earlyclobber $vd" in { defm VNSRL_W : VALU_IV_V_X_I<"vnsrl", 0b101100, uimm5, "w">; defm VNSRA_W : VALU_IV_V_X_I<"vnsra", 0b101101, uimm5, "w">; -} // Constraints = "@earlyclobber $vd", RVVConstraint = Narrow +} // Constraints = "@earlyclobber $vd" def : InstAlias
[llvm-branch-commits] [llvm] 5d35422 - [RISCV] Correct DWARF number for vector registers.
Author: Hsiangkai Wang Date: 2021-01-22T11:33:42+08:00 New Revision: 5d354220d44f11c70f36d5a357ec2a2208a6ab92 URL: https://github.com/llvm/llvm-project/commit/5d354220d44f11c70f36d5a357ec2a2208a6ab92 DIFF: https://github.com/llvm/llvm-project/commit/5d354220d44f11c70f36d5a357ec2a2208a6ab92.diff LOG: [RISCV] Correct DWARF number for vector registers. The DWARF numbers of vector registers are already defined in riscv-elf-psabi. The DWARF number for vector is start from 96. Correct the DWARF numbers of vector registers. Differential Revision: https://reviews.llvm.org/D94749 Added: Modified: llvm/lib/Target/RISCV/RISCVRegisterInfo.td Removed: diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td index 3b79a10f111b..e1a11fd9389f 100644 --- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td +++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td @@ -411,7 +411,7 @@ class VRegList LIn, int start, int nf, int lmul> { // Vector registers let RegAltNameIndices = [ABIRegAltName] in { foreach Index = 0-31 in { -def V#Index : RISCVReg, DwarfRegNum<[!add(Index, 64)]>; +def V#Index : RISCVReg, DwarfRegNum<[!add(Index, 96)]>; } foreach Index = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 97e33fe - [RISCV] Implement vloxseg/vluxseg intrinsics.
Author: Hsiangkai Wang Date: 2021-01-23T08:54:56+08:00 New Revision: 97e33feb08aa9c042408862e555423f037753e12 URL: https://github.com/llvm/llvm-project/commit/97e33feb08aa9c042408862e555423f037753e12 DIFF: https://github.com/llvm/llvm-project/commit/97e33feb08aa9c042408862e555423f037753e12.diff LOG: [RISCV] Implement vloxseg/vluxseg intrinsics. Define vloxseg/vluxseg intrinsics and pseudo instructions. Lower vloxseg/vluxseg intrinsics to pseudo instructions in RISCVDAGToDAGISel. Differential Revision: https://reviews.llvm.org/D94903 Added: Modified: llvm/include/llvm/IR/IntrinsicsRISCV.td llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h llvm/lib/Target/RISCV/RISCVISelLowering.h llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td Removed: diff --git a/llvm/include/llvm/IR/IntrinsicsRISCV.td b/llvm/include/llvm/IR/IntrinsicsRISCV.td index 407b27744477..a9629806e875 100644 --- a/llvm/include/llvm/IR/IntrinsicsRISCV.td +++ b/llvm/include/llvm/IR/IntrinsicsRISCV.td @@ -543,6 +543,25 @@ let TargetPrefix = "riscv" in { LLVMMatchType<1>]), [NoCapture>, IntrReadMem]>, RISCVVIntrinsic; + // For indexed segment load + // Input: (pointer, index, vl) + class RISCVISegLoad +: Intrinsic, +!add(nf, -1))), +[LLVMPointerToElt<0>, llvm_anyvector_ty, llvm_anyint_ty], +[NoCapture>, IntrReadMem]>, RISCVVIntrinsic; + // For indexed segment load with mask + // Input: (maskedoff, pointer, index, mask, vl) + class RISCVISegLoadMask +: Intrinsic, +!add(nf, -1))), +!listconcat(!listsplat(LLVMMatchType<0>, nf), +[LLVMPointerToElt<0>, + llvm_anyvector_ty, + LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>, + llvm_anyint_ty]), +[NoCapture>, IntrReadMem]>, RISCVVIntrinsic; + // For unit stride segment store // Input: (value, pointer, vl) class RISCVUSSegStore @@ -696,6 +715,10 @@ let TargetPrefix = "riscv" in { def "int_riscv_" # NAME : RISCVSSegLoad; def "int_riscv_" # NAME # "_mask" : RISCVSSegLoadMask; } + multiclass RISCVISegLoad { +def "int_riscv_" # NAME : RISCVISegLoad; +def "int_riscv_" # NAME # "_mask" : RISCVISegLoadMask; + } multiclass RISCVUSSegStore { def "int_riscv_" # NAME : RISCVUSSegStore; def "int_riscv_" # NAME # "_mask" : RISCVUSSegStoreMask; @@ -1002,6 +1025,8 @@ let TargetPrefix = "riscv" in { foreach nf = [2, 3, 4, 5, 6, 7, 8] in { defm vlseg # nf : RISCVUSSegLoad; defm vlsseg # nf : RISCVSSegLoad; +defm vloxseg # nf : RISCVISegLoad; +defm vluxseg # nf : RISCVISegLoad; defm vsseg # nf : RISCVUSSegStore; defm vssseg # nf : RISCVSSegStore; } diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp index 4c873a0482f9..81972d88f630 100644 --- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp @@ -169,7 +169,8 @@ void RISCVDAGToDAGISel::selectVLSEG(SDNode *Node, unsigned IntNo, Operands.push_back(SEW); Operands.push_back(Node->getOperand(0)); // Chain. const RISCVZvlssegTable::RISCVZvlsseg *P = RISCVZvlssegTable::getPseudo( - IntNo, ScalarSize, static_cast(LMUL)); + IntNo, ScalarSize, static_cast(LMUL), + static_cast(RISCVVLMUL::LMUL_1)); SDNode *Load = CurDAG->getMachineNode(P->Pseudo, DL, MVT::Untyped, MVT::Other, Operands); SDValue SuperReg = SDValue(Load, 0); @@ -207,7 +208,79 @@ void RISCVDAGToDAGISel::selectVLSEGMask(SDNode *Node, unsigned IntNo, Operands.push_back(SEW); Operands.push_back(Node->getOperand(0)); /// Chain. const RISCVZvlssegTable::RISCVZvlsseg *P = RISCVZvlssegTable::getPseudo( - IntNo, ScalarSize, static_cast(LMUL)); + IntNo, ScalarSize, static_cast(LMUL), + static_cast(RISCVVLMUL::LMUL_1)); + SDNode *Load = + CurDAG->getMachineNode(P->Pseudo, DL, MVT::Untyped, MVT::Other, Operands); + SDValue SuperReg = SDValue(Load, 0); + for (unsigned I = 0; I < NF; ++I) +ReplaceUses(SDValue(Node, I), +CurDAG->getTargetExtractSubreg(getSubregIndexByEVT(VT, I), DL, + VT, SuperReg)); + + ReplaceUses(SDValue(Node, NF), SDValue(Load, 1)); + CurDAG->RemoveDeadNode(Node); +} + +void RISCVDAGToDAGISel::selectVLXSEG(SDNode *Node, unsigned IntNo) { + SDLoc DL(Node); + unsigned NF = Node->getNumValues() - 1; + EVT VT = Node->getValueType(0); + unsigned ScalarSize = VT.getScalarSizeInBits(); + MVT XLenVT = Subtarget->getXLenVT(); + RISCVVLMUL LMUL = getLMUL(VT); + SDValue SEW = CurDAG->getTargetConstant(ScalarS
[llvm-branch-commits] [llvm] e433715 - [NFC][RISCV] Move vmsge{u}.vx processing to RISCVAsmParser.
Author: Hsiangkai Wang Date: 2021-01-02T08:42:53+08:00 New Revision: e4337159e3d1c70b1ec58f43fa59c9f0fd693e51 URL: https://github.com/llvm/llvm-project/commit/e4337159e3d1c70b1ec58f43fa59c9f0fd693e51 DIFF: https://github.com/llvm/llvm-project/commit/e4337159e3d1c70b1ec58f43fa59c9f0fd693e51.diff LOG: [NFC][RISCV] Move vmsge{u}.vx processing to RISCVAsmParser. We could expand vmsge{u}.vx pseudo instructions in RISCVAsmParser. It is more appropriate to expand it before encoding. Differential Revision: https://reviews.llvm.org/D93968 Added: Modified: llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp llvm/test/MC/RISCV/rvv/compare.s Removed: diff --git a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp index c735aaf5ec63..4172d33384bf 100644 --- a/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp +++ b/llvm/lib/Target/RISCV/AsmParser/RISCVAsmParser.cpp @@ -129,6 +129,9 @@ class RISCVAsmParser : public MCTargetAsmParser { void emitPseudoExtend(MCInst &Inst, bool SignExtend, int64_t Width, SMLoc IDLoc, MCStreamer &Out); + // Helper to emit pseudo vmsge{u}.vx instruction. + void emitVMSGE(MCInst &Inst, unsigned Opcode, SMLoc IDLoc, MCStreamer &Out); + // Checks that a PseudoAddTPRel is using x4/tp in its second input operand. // Enforcing this using a restricted register class for the second input // operand of PseudoAddTPRel results in a poor diagnostic due to the fact @@ -2257,6 +2260,59 @@ void RISCVAsmParser::emitPseudoExtend(MCInst &Inst, bool SignExtend, .addImm(ShAmt)); } +void RISCVAsmParser::emitVMSGE(MCInst &Inst, unsigned Opcode, SMLoc IDLoc, + MCStreamer &Out) { + if (Inst.getNumOperands() == 3) { +// unmasked va >= x +// +// pseudoinstruction: vmsge{u}.vx vd, va, x +// expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd +emitToStreamer(Out, MCInstBuilder(Opcode) +.addOperand(Inst.getOperand(0)) +.addOperand(Inst.getOperand(1)) +.addOperand(Inst.getOperand(2)) +.addReg(RISCV::NoRegister)); +emitToStreamer(Out, MCInstBuilder(RISCV::VMNAND_MM) +.addOperand(Inst.getOperand(0)) +.addOperand(Inst.getOperand(0)) +.addOperand(Inst.getOperand(0))); + } else if (Inst.getNumOperands() == 4) { +// masked va >= x, vd != v0 +// +// pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t +// expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 +assert(Inst.getOperand(0).getReg() != RISCV::V0 && + "The destination register should not be V0."); +emitToStreamer(Out, MCInstBuilder(Opcode) +.addOperand(Inst.getOperand(0)) +.addOperand(Inst.getOperand(1)) +.addOperand(Inst.getOperand(2)) +.addOperand(Inst.getOperand(3))); +emitToStreamer(Out, MCInstBuilder(RISCV::VMXOR_MM) +.addOperand(Inst.getOperand(0)) +.addOperand(Inst.getOperand(0)) +.addReg(RISCV::V0)); + } else if (Inst.getNumOperands() == 5) { +// masked va >= x, vd == v0 +// +// pseudoinstruction: vmsge{u}.vx vd, va, x, v0.t, vt +// expansion: vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt +assert(Inst.getOperand(0).getReg() == RISCV::V0 && + "The destination register should be V0."); +assert(Inst.getOperand(1).getReg() != RISCV::V0 && + "The temporary vector register should not be V0."); +emitToStreamer(Out, MCInstBuilder(Opcode) +.addOperand(Inst.getOperand(1)) +.addOperand(Inst.getOperand(2)) +.addOperand(Inst.getOperand(3)) +.addOperand(Inst.getOperand(4))); +emitToStreamer(Out, MCInstBuilder(RISCV::VMANDNOT_MM) +.addOperand(Inst.getOperand(0)) +.addOperand(Inst.getOperand(0)) +.addOperand(Inst.getOperand(1))); + } +} + bool RISCVAsmParser::checkPseudoAddTPRel(MCInst &Inst, OperandVector &Operands) { assert(Inst.getOpcode() == RISCV::PseudoAddTPRel && "Invalid instruction"); @@ -2432,6 +2488,16 @@ bool RISCVAsmParser::processInstruction(MCInst &Inst, SMLoc IDLoc, case RISCV::PseudoZEXT_W: emitPseudoExtend(Inst, /*SignExtend=*/false, /*Width=*/32, IDLoc, Out); return false; + case RISCV::PseudoVMSGEU_VX: + case RISCV::PseudoVMSGEU_VX_M: + case RISCV::PseudoVMSGEU_VX_M_
[llvm-branch-commits] [llvm] 5e47606 - [NFC][AsmPrinter] Make comments for spill/reload more precise.
Author: Hsiangkai Wang Date: 2021-01-11T15:00:27+08:00 New Revision: 5e476061deb82ed4e6d440445f8830e1c7bccaa6 URL: https://github.com/llvm/llvm-project/commit/5e476061deb82ed4e6d440445f8830e1c7bccaa6 DIFF: https://github.com/llvm/llvm-project/commit/5e476061deb82ed4e6d440445f8830e1c7bccaa6.diff LOG: [NFC][AsmPrinter] Make comments for spill/reload more precise. The size of spill/reload may be unknown for scalable vector types. When the size is unknown, print it as "Unknown-size" instead of a very large number. Differential Revision: https://reviews.llvm.org/D94299 Added: Modified: llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp Removed: diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp index d72a91825061..f4749f8ca95d 100644 --- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp +++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp @@ -31,6 +31,7 @@ #include "llvm/ADT/Twine.h" #include "llvm/Analysis/ConstantFolding.h" #include "llvm/Analysis/EHPersonalities.h" +#include "llvm/Analysis/MemoryLocation.h" #include "llvm/Analysis/OptimizationRemarkEmitter.h" #include "llvm/BinaryFormat/COFF.h" #include "llvm/BinaryFormat/Dwarf.h" @@ -844,13 +845,21 @@ static void emitComments(const MachineInstr &MI, raw_ostream &CommentOS) { if ((Size = MI.getRestoreSize(TII))) { CommentOS << *Size << "-byte Reload\n"; } else if ((Size = MI.getFoldedRestoreSize(TII))) { -if (*Size) - CommentOS << *Size << "-byte Folded Reload\n"; +if (*Size) { + if (*Size == static_cast(MemoryLocation::UnknownSize)) +CommentOS << "Unknown-size Folded Reload\n"; + else +CommentOS << *Size << "-byte Folded Reload\n"; +} } else if ((Size = MI.getSpillSize(TII))) { CommentOS << *Size << "-byte Spill\n"; } else if ((Size = MI.getFoldedSpillSize(TII))) { -if (*Size) - CommentOS << *Size << "-byte Folded Spill\n"; +if (*Size) { + if (*Size == static_cast(MemoryLocation::UnknownSize)) +CommentOS << "Unknown-size Folded Spill\n"; + else +CommentOS << *Size << "-byte Folded Spill\n"; +} } // Check for spill-induced copies ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] e0d43a3 - [RISCV][NFC] Define scalable vectors for half types.
Author: Hsiangkai Wang Date: 2020-12-15T13:49:54+08:00 New Revision: e0d43a3b37a0d14c1caf3a79a4ad57a7a75fc3ae URL: https://github.com/llvm/llvm-project/commit/e0d43a3b37a0d14c1caf3a79a4ad57a7a75fc3ae DIFF: https://github.com/llvm/llvm-project/commit/e0d43a3b37a0d14c1caf3a79a4ad57a7a75fc3ae.diff LOG: [RISCV][NFC] Define scalable vectors for half types. This is a preperation work for vfadd intrinsics. Added: Modified: llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVRegisterInfo.td Removed: diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index a6aa81be1e40..529a5bf784f4 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -124,6 +124,13 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM, addRegisterClass(RISCVVMVTs::vint64m4_t, &RISCV::VRM4RegClass); addRegisterClass(RISCVVMVTs::vint64m8_t, &RISCV::VRM8RegClass); +addRegisterClass(RISCVVMVTs::vfloat16mf4_t, &RISCV::VRRegClass); +addRegisterClass(RISCVVMVTs::vfloat16mf2_t, &RISCV::VRRegClass); +addRegisterClass(RISCVVMVTs::vfloat16m1_t, &RISCV::VRRegClass); +addRegisterClass(RISCVVMVTs::vfloat16m2_t, &RISCV::VRM2RegClass); +addRegisterClass(RISCVVMVTs::vfloat16m4_t, &RISCV::VRM4RegClass); +addRegisterClass(RISCVVMVTs::vfloat16m8_t, &RISCV::VRM8RegClass); + addRegisterClass(RISCVVMVTs::vfloat32mf2_t, &RISCV::VRRegClass); addRegisterClass(RISCVVMVTs::vfloat32m1_t, &RISCV::VRRegClass); addRegisterClass(RISCVVMVTs::vfloat32m2_t, &RISCV::VRM2RegClass); diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td index ea7357c9c073..b69cdde6c532 100644 --- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td +++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td @@ -314,6 +314,13 @@ defvar vint64m2_t = nxv2i64; defvar vint64m4_t = nxv4i64; defvar vint64m8_t = nxv8i64; +defvar vfloat16mf4_t = nxv1f16; +defvar vfloat16mf2_t = nxv2f16; +defvar vfloat16m1_t = nxv4f16; +defvar vfloat16m2_t = nxv8f16; +defvar vfloat16m4_t = nxv16f16; +defvar vfloat16m8_t = nxv32f16; + defvar vfloat32mf2_t = nxv1f32; defvar vfloat32m1_t = nxv2f32; defvar vfloat32m2_t = nxv4f32; @@ -391,6 +398,7 @@ class VReg regTypes, dag regList, int Vlmul> def VR : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, vint16mf2_t, vint16mf4_t, vint32mf2_t, vint8m1_t, vint16m1_t, vint32m1_t, vint64m1_t, + vfloat16mf4_t, vfloat16mf2_t, vfloat16m1_t, vfloat32mf2_t, vfloat32m1_t, vfloat64m1_t, vbool64_t, vbool32_t, vbool16_t, vbool8_t, vbool4_t, vbool2_t, vbool1_t], @@ -401,6 +409,7 @@ def VR : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, def VRNoV0 : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, vint16mf2_t, vint16mf4_t, vint32mf2_t, vint8m1_t, vint16m1_t, vint32m1_t, vint64m1_t, + vfloat16mf4_t, vfloat16mf2_t, vfloat16m1_t, vfloat32mf2_t, vfloat32m1_t, vfloat64m1_t, vbool64_t, vbool32_t, vbool16_t, vbool8_t, vbool4_t, vbool2_t, vbool1_t], @@ -409,29 +418,29 @@ def VRNoV0 : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, (sequence "V%u", 1, 7)), 1>; def VRM2 : VReg<[vint8m2_t, vint16m2_t, vint32m2_t, vint64m2_t, - vfloat32m2_t, vfloat64m2_t], + vfloat16m2_t, vfloat32m2_t, vfloat64m2_t], (add V26M2, V28M2, V30M2, V8M2, V10M2, V12M2, V14M2, V16M2, V18M2, V20M2, V22M2, V24M2, V0M2, V2M2, V4M2, V6M2), 2>; def VRM2NoV0 : VReg<[vint8m2_t, vint16m2_t, vint32m2_t, vint64m2_t, - vfloat32m2_t, vfloat64m2_t], + vfloat16m2_t, vfloat32m2_t, vfloat64m2_t], (add V26M2, V28M2, V30M2, V8M2, V10M2, V12M2, V14M2, V16M2, V18M2, V20M2, V22M2, V24M2, V2M2, V4M2, V6M2), 2>; def VRM4 : VReg<[vint8m4_t, vint16m4_t, vint32m4_t, vint64m4_t, - vfloat32m4_t, vfloat64m4_t], + vfloat16m4_t, vfloat32m4_t, vfloat64m4_t], (add V28M4, V8M4, V12M4, V16M4, V20M4, V24M4, V0M4, V4M4), 4>; def VRM4NoV0 : VReg<[vint8m4_t, vint16m4_t, vint32m4_t, vint64m4_t, - vfloat32m4_t, vfloat64m4_t], + vfloat16m4_t, vfloat32m4_t, vfloat64m4_t], (add V28M4, V8M4, V12M4, V16M4, V20M4, V24M4, V4M4), 4>; def VRM8 : VReg<[vint8m8_t, vint16m8_t, vint32m8_t, vint64m8_t, - vfloat32m8_t, vfloat64m8_t], + vfloat16m8_t, vfloat32m8_t, vfloat64m8_t], (add V8M8, V16M8, V24M8, V0M8), 8>; def VRM8NoV0 : VReg<[vint8m8_t, vint16m8_t, vint32m8_t, vint64m8_t, - vfloat32m8_t, vfloat64m8_t], +
[llvm-branch-commits] [llvm] cbbdcfc - [RISCV] V does not imply F.
Author: Hsiangkai Wang Date: 2020-12-15T14:59:22+08:00 New Revision: cbbdcfc47c86ddafe4ebad49f93dd37b513db0ac URL: https://github.com/llvm/llvm-project/commit/cbbdcfc47c86ddafe4ebad49f93dd37b513db0ac DIFF: https://github.com/llvm/llvm-project/commit/cbbdcfc47c86ddafe4ebad49f93dd37b513db0ac.diff LOG: [RISCV] V does not imply F. If users want to use vector floating point instructions, they need to specify 'F' extension additionally. Added: Modified: llvm/lib/Target/RISCV/RISCV.td llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVInstrInfoV.td llvm/lib/Target/RISCV/RISCVSchedRocket.td llvm/lib/Target/RISCV/RISCVSchedSiFive7.td llvm/test/MC/RISCV/rvv/convert.s llvm/test/MC/RISCV/rvv/fadd.s llvm/test/MC/RISCV/rvv/fcompare.s llvm/test/MC/RISCV/rvv/fdiv.s llvm/test/MC/RISCV/rvv/fmacc.s llvm/test/MC/RISCV/rvv/fminmax.s llvm/test/MC/RISCV/rvv/fmul.s llvm/test/MC/RISCV/rvv/fmv.s llvm/test/MC/RISCV/rvv/fothers.s llvm/test/MC/RISCV/rvv/freduction.s llvm/test/MC/RISCV/rvv/fsub.s llvm/test/MC/RISCV/rvv/sign-injection.s Removed: diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td index 7a7b3bb0ad32..b1f2658e76b8 100644 --- a/llvm/lib/Target/RISCV/RISCV.td +++ b/llvm/lib/Target/RISCV/RISCV.td @@ -160,11 +160,14 @@ def HasRVCHints : Predicate<"Subtarget->enableRVCHintInstrs()">, def FeatureStdExtV : SubtargetFeature<"experimental-v", "HasStdExtV", "true", - "'V' (Vector Instructions)", - [FeatureStdExtF]>; + "'V' (Vector Instructions)">; def HasStdExtV : Predicate<"Subtarget->hasStdExtV()">, AssemblerPredicate<(all_of FeatureStdExtV), "'V' (Vector Instructions)">; +def HasStdExtVAndF +: Predicate<"Subtarget->hasStdExtV() && Subtarget->hasStdExtF()">, +AssemblerPredicate<(all_of FeatureStdExtV, FeatureStdExtF), +"'V' and 'F' (Vector Floating-point Instructions)">; def FeatureStdExtZvlsseg : SubtargetFeature<"experimental-zvlsseg", "HasStdExtZvlsseg", "true", diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index 529a5bf784f4..f805a7dcc60c 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -124,23 +124,29 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM, addRegisterClass(RISCVVMVTs::vint64m4_t, &RISCV::VRM4RegClass); addRegisterClass(RISCVVMVTs::vint64m8_t, &RISCV::VRM8RegClass); -addRegisterClass(RISCVVMVTs::vfloat16mf4_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat16mf2_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat16m1_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat16m2_t, &RISCV::VRM2RegClass); -addRegisterClass(RISCVVMVTs::vfloat16m4_t, &RISCV::VRM4RegClass); -addRegisterClass(RISCVVMVTs::vfloat16m8_t, &RISCV::VRM8RegClass); - -addRegisterClass(RISCVVMVTs::vfloat32mf2_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat32m1_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat32m2_t, &RISCV::VRM2RegClass); -addRegisterClass(RISCVVMVTs::vfloat32m4_t, &RISCV::VRM4RegClass); -addRegisterClass(RISCVVMVTs::vfloat32m8_t, &RISCV::VRM8RegClass); - -addRegisterClass(RISCVVMVTs::vfloat64m1_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat64m2_t, &RISCV::VRM2RegClass); -addRegisterClass(RISCVVMVTs::vfloat64m4_t, &RISCV::VRM4RegClass); -addRegisterClass(RISCVVMVTs::vfloat64m8_t, &RISCV::VRM8RegClass); +if (Subtarget.hasStdExtZfh()) { + addRegisterClass(RISCVVMVTs::vfloat16mf4_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat16mf2_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat16m1_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat16m2_t, &RISCV::VRM2RegClass); + addRegisterClass(RISCVVMVTs::vfloat16m4_t, &RISCV::VRM4RegClass); + addRegisterClass(RISCVVMVTs::vfloat16m8_t, &RISCV::VRM8RegClass); +} + +if (Subtarget.hasStdExtF()) { + addRegisterClass(RISCVVMVTs::vfloat32mf2_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat32m1_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat32m2_t, &RISCV::VRM2RegClass); + addRegisterClass(RISCVVMVTs::vfloat32m4_t, &RISCV::VRM4RegClass); + addRegisterClass(RISCVVMVTs::vfloat32m8_t, &RISCV::VRM8RegClass); +} + +if (Subtarget.hasStdExtD()) { + addRegisterClass(RISCVVMVTs::vfloat64m1_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat64m2_t, &RISCV::VRM2RegClass); + addRegisterClass(RISCVVMVTs::vfloat64m4_t, &RISCV::VRM4RegClass); + addRegisterClass(RISCVVMVTs::vfloat64m8_t, &RISCV:
[llvm-branch-commits] [llvm] 14a91d6 - [RISCV][NFC] Define scalable vectors for half types.
Author: Hsiangkai Wang Date: 2020-12-15T16:23:22+08:00 New Revision: 14a91d676b794db09c14abecf363650a8fc90c61 URL: https://github.com/llvm/llvm-project/commit/14a91d676b794db09c14abecf363650a8fc90c61 DIFF: https://github.com/llvm/llvm-project/commit/14a91d676b794db09c14abecf363650a8fc90c61.diff LOG: [RISCV][NFC] Define scalable vectors for half types. This is a preperation work for vfadd intrinsics. Differential Revision: https://reviews.llvm.org/D93275 Added: Modified: llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVRegisterInfo.td Removed: diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index a6aa81be1e40..529a5bf784f4 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -124,6 +124,13 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM, addRegisterClass(RISCVVMVTs::vint64m4_t, &RISCV::VRM4RegClass); addRegisterClass(RISCVVMVTs::vint64m8_t, &RISCV::VRM8RegClass); +addRegisterClass(RISCVVMVTs::vfloat16mf4_t, &RISCV::VRRegClass); +addRegisterClass(RISCVVMVTs::vfloat16mf2_t, &RISCV::VRRegClass); +addRegisterClass(RISCVVMVTs::vfloat16m1_t, &RISCV::VRRegClass); +addRegisterClass(RISCVVMVTs::vfloat16m2_t, &RISCV::VRM2RegClass); +addRegisterClass(RISCVVMVTs::vfloat16m4_t, &RISCV::VRM4RegClass); +addRegisterClass(RISCVVMVTs::vfloat16m8_t, &RISCV::VRM8RegClass); + addRegisterClass(RISCVVMVTs::vfloat32mf2_t, &RISCV::VRRegClass); addRegisterClass(RISCVVMVTs::vfloat32m1_t, &RISCV::VRRegClass); addRegisterClass(RISCVVMVTs::vfloat32m2_t, &RISCV::VRM2RegClass); diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td index ea7357c9c073..b69cdde6c532 100644 --- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td +++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td @@ -314,6 +314,13 @@ defvar vint64m2_t = nxv2i64; defvar vint64m4_t = nxv4i64; defvar vint64m8_t = nxv8i64; +defvar vfloat16mf4_t = nxv1f16; +defvar vfloat16mf2_t = nxv2f16; +defvar vfloat16m1_t = nxv4f16; +defvar vfloat16m2_t = nxv8f16; +defvar vfloat16m4_t = nxv16f16; +defvar vfloat16m8_t = nxv32f16; + defvar vfloat32mf2_t = nxv1f32; defvar vfloat32m1_t = nxv2f32; defvar vfloat32m2_t = nxv4f32; @@ -391,6 +398,7 @@ class VReg regTypes, dag regList, int Vlmul> def VR : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, vint16mf2_t, vint16mf4_t, vint32mf2_t, vint8m1_t, vint16m1_t, vint32m1_t, vint64m1_t, + vfloat16mf4_t, vfloat16mf2_t, vfloat16m1_t, vfloat32mf2_t, vfloat32m1_t, vfloat64m1_t, vbool64_t, vbool32_t, vbool16_t, vbool8_t, vbool4_t, vbool2_t, vbool1_t], @@ -401,6 +409,7 @@ def VR : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, def VRNoV0 : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, vint16mf2_t, vint16mf4_t, vint32mf2_t, vint8m1_t, vint16m1_t, vint32m1_t, vint64m1_t, + vfloat16mf4_t, vfloat16mf2_t, vfloat16m1_t, vfloat32mf2_t, vfloat32m1_t, vfloat64m1_t, vbool64_t, vbool32_t, vbool16_t, vbool8_t, vbool4_t, vbool2_t, vbool1_t], @@ -409,29 +418,29 @@ def VRNoV0 : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, (sequence "V%u", 1, 7)), 1>; def VRM2 : VReg<[vint8m2_t, vint16m2_t, vint32m2_t, vint64m2_t, - vfloat32m2_t, vfloat64m2_t], + vfloat16m2_t, vfloat32m2_t, vfloat64m2_t], (add V26M2, V28M2, V30M2, V8M2, V10M2, V12M2, V14M2, V16M2, V18M2, V20M2, V22M2, V24M2, V0M2, V2M2, V4M2, V6M2), 2>; def VRM2NoV0 : VReg<[vint8m2_t, vint16m2_t, vint32m2_t, vint64m2_t, - vfloat32m2_t, vfloat64m2_t], + vfloat16m2_t, vfloat32m2_t, vfloat64m2_t], (add V26M2, V28M2, V30M2, V8M2, V10M2, V12M2, V14M2, V16M2, V18M2, V20M2, V22M2, V24M2, V2M2, V4M2, V6M2), 2>; def VRM4 : VReg<[vint8m4_t, vint16m4_t, vint32m4_t, vint64m4_t, - vfloat32m4_t, vfloat64m4_t], + vfloat16m4_t, vfloat32m4_t, vfloat64m4_t], (add V28M4, V8M4, V12M4, V16M4, V20M4, V24M4, V0M4, V4M4), 4>; def VRM4NoV0 : VReg<[vint8m4_t, vint16m4_t, vint32m4_t, vint64m4_t, - vfloat32m4_t, vfloat64m4_t], + vfloat16m4_t, vfloat32m4_t, vfloat64m4_t], (add V28M4, V8M4, V12M4, V16M4, V20M4, V24M4, V4M4), 4>; def VRM8 : VReg<[vint8m8_t, vint16m8_t, vint32m8_t, vint64m8_t, - vfloat32m8_t, vfloat64m8_t], + vfloat16m8_t, vfloat32m8_t, vfloat64m8_t], (add V8M8, V16M8, V24M8, V0M8), 8>; def VRM8NoV0 : VReg<[vint8m8_t, vint16m8_t, vint32m8_t, vint64m8_
[llvm-branch-commits] [llvm] f03609b - [RISCV] V does not imply F.
Author: Hsiangkai Wang Date: 2020-12-17T10:57:36+08:00 New Revision: f03609b5c7531061be659e36824d37ef86a1fdf4 URL: https://github.com/llvm/llvm-project/commit/f03609b5c7531061be659e36824d37ef86a1fdf4 DIFF: https://github.com/llvm/llvm-project/commit/f03609b5c7531061be659e36824d37ef86a1fdf4.diff LOG: [RISCV] V does not imply F. If users want to use vector floating point instructions, they need to specify 'F' extension additionally. Differential Revision: https://reviews.llvm.org/D93282 Added: Modified: llvm/lib/Target/RISCV/RISCV.td llvm/lib/Target/RISCV/RISCVISelLowering.cpp llvm/lib/Target/RISCV/RISCVInstrInfoV.td llvm/test/CodeGen/RISCV/rvv/vle-rv32.ll llvm/test/CodeGen/RISCV/rvv/vle-rv64.ll llvm/test/CodeGen/RISCV/rvv/vse-rv32.ll llvm/test/CodeGen/RISCV/rvv/vse-rv64.ll llvm/test/MC/RISCV/rvv/convert.s llvm/test/MC/RISCV/rvv/fadd.s llvm/test/MC/RISCV/rvv/fcompare.s llvm/test/MC/RISCV/rvv/fdiv.s llvm/test/MC/RISCV/rvv/fmacc.s llvm/test/MC/RISCV/rvv/fminmax.s llvm/test/MC/RISCV/rvv/fmul.s llvm/test/MC/RISCV/rvv/fmv.s llvm/test/MC/RISCV/rvv/fothers.s llvm/test/MC/RISCV/rvv/freduction.s llvm/test/MC/RISCV/rvv/fsub.s llvm/test/MC/RISCV/rvv/sign-injection.s Removed: diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td index 7a7b3bb0ad32..56339d7df52d 100644 --- a/llvm/lib/Target/RISCV/RISCV.td +++ b/llvm/lib/Target/RISCV/RISCV.td @@ -160,8 +160,7 @@ def HasRVCHints : Predicate<"Subtarget->enableRVCHintInstrs()">, def FeatureStdExtV : SubtargetFeature<"experimental-v", "HasStdExtV", "true", - "'V' (Vector Instructions)", - [FeatureStdExtF]>; + "'V' (Vector Instructions)">; def HasStdExtV : Predicate<"Subtarget->hasStdExtV()">, AssemblerPredicate<(all_of FeatureStdExtV), "'V' (Vector Instructions)">; diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp index c0202e3f19e0..aeb6b0623862 100644 --- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp +++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp @@ -124,23 +124,29 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM, addRegisterClass(RISCVVMVTs::vint64m4_t, &RISCV::VRM4RegClass); addRegisterClass(RISCVVMVTs::vint64m8_t, &RISCV::VRM8RegClass); -addRegisterClass(RISCVVMVTs::vfloat16mf4_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat16mf2_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat16m1_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat16m2_t, &RISCV::VRM2RegClass); -addRegisterClass(RISCVVMVTs::vfloat16m4_t, &RISCV::VRM4RegClass); -addRegisterClass(RISCVVMVTs::vfloat16m8_t, &RISCV::VRM8RegClass); - -addRegisterClass(RISCVVMVTs::vfloat32mf2_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat32m1_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat32m2_t, &RISCV::VRM2RegClass); -addRegisterClass(RISCVVMVTs::vfloat32m4_t, &RISCV::VRM4RegClass); -addRegisterClass(RISCVVMVTs::vfloat32m8_t, &RISCV::VRM8RegClass); - -addRegisterClass(RISCVVMVTs::vfloat64m1_t, &RISCV::VRRegClass); -addRegisterClass(RISCVVMVTs::vfloat64m2_t, &RISCV::VRM2RegClass); -addRegisterClass(RISCVVMVTs::vfloat64m4_t, &RISCV::VRM4RegClass); -addRegisterClass(RISCVVMVTs::vfloat64m8_t, &RISCV::VRM8RegClass); +if (Subtarget.hasStdExtZfh()) { + addRegisterClass(RISCVVMVTs::vfloat16mf4_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat16mf2_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat16m1_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat16m2_t, &RISCV::VRM2RegClass); + addRegisterClass(RISCVVMVTs::vfloat16m4_t, &RISCV::VRM4RegClass); + addRegisterClass(RISCVVMVTs::vfloat16m8_t, &RISCV::VRM8RegClass); +} + +if (Subtarget.hasStdExtF()) { + addRegisterClass(RISCVVMVTs::vfloat32mf2_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat32m1_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat32m2_t, &RISCV::VRM2RegClass); + addRegisterClass(RISCVVMVTs::vfloat32m4_t, &RISCV::VRM4RegClass); + addRegisterClass(RISCVVMVTs::vfloat32m8_t, &RISCV::VRM8RegClass); +} + +if (Subtarget.hasStdExtD()) { + addRegisterClass(RISCVVMVTs::vfloat64m1_t, &RISCV::VRRegClass); + addRegisterClass(RISCVVMVTs::vfloat64m2_t, &RISCV::VRM2RegClass); + addRegisterClass(RISCVVMVTs::vfloat64m4_t, &RISCV::VRM4RegClass); + addRegisterClass(RISCVVMVTs::vfloat64m8_t, &RISCV::VRM8RegClass); +} } // Compute derived properties from the register classes. diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoV.td b/llvm/lib/Target/RISCV/RISCVInstrInfoV.td index 220aa9acc7
[llvm-branch-commits] [llvm] 7087ae7 - [RISCV] Remove NoVReg to avoid compile warning messages.
Author: Hsiangkai Wang Date: 2020-12-18T11:37:47+08:00 New Revision: 7087ae7be9f00b95d14bfba41264bbbd8f8711f2 URL: https://github.com/llvm/llvm-project/commit/7087ae7be9f00b95d14bfba41264bbbd8f8711f2 DIFF: https://github.com/llvm/llvm-project/commit/7087ae7be9f00b95d14bfba41264bbbd8f8711f2.diff LOG: [RISCV] Remove NoVReg to avoid compile warning messages. Added: Modified: llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td llvm/lib/Target/RISCV/RISCVRegisterInfo.td Removed: diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td index 7f5210310df7..3363aed34f39 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td @@ -47,7 +47,7 @@ class LMULInfo { def V_M1 : LMULInfo<0b000, VR, VRM2, "M1">; def V_M2 : LMULInfo<0b001, VRM2, VRM4, "M2">; def V_M4 : LMULInfo<0b010, VRM4, VRM8, "M4">; -def V_M8 : LMULInfo<0b011, VRM8, NoVReg, "M8">; +def V_M8 : LMULInfo<0b011, VRM8, VR, "M8">; def V_MF8 : LMULInfo<0b101, VR, VR, "MF8">; def V_MF4 : LMULInfo<0b110, VR, VR, "MF4">; diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td index b87658fea59a..442cb2e4b0b8 100644 --- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td +++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td @@ -396,9 +396,6 @@ class VReg regTypes, dag regList, int Vlmul> int Size = !mul(Vlmul, 64); // FIXME: assuming ELEN=64 } -// Dummy V register class. -def NoVReg : VReg<[vint8m1_t], (add V0), 0>; - def VR : VReg<[vint8mf2_t, vint8mf4_t, vint8mf8_t, vint16mf2_t, vint16mf4_t, vint32mf2_t, vint8m1_t, vint16m1_t, vint32m1_t, vint64m1_t, ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 41ab45d - [RISCV] Define vector vfwmul intrinsics.
Author: Hsiangkai Wang Date: 2020-12-20T17:39:20+08:00 New Revision: 41ab45d6624602ba10486f044c0fd06db5b9bedb URL: https://github.com/llvm/llvm-project/commit/41ab45d6624602ba10486f044c0fd06db5b9bedb DIFF: https://github.com/llvm/llvm-project/commit/41ab45d6624602ba10486f044c0fd06db5b9bedb.diff LOG: [RISCV] Define vector vfwmul intrinsics. Define vector vfwmul intrinsics and lower them to V instructions. We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez Co-Authored-by: Hsiangkai Wang Differential Revision: https://reviews.llvm.org/D93584 Added: llvm/test/CodeGen/RISCV/rvv/vfwmul-rv32.ll llvm/test/CodeGen/RISCV/rvv/vfwmul-rv64.ll Modified: llvm/include/llvm/IR/IntrinsicsRISCV.td llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td Removed: diff --git a/llvm/include/llvm/IR/IntrinsicsRISCV.td b/llvm/include/llvm/IR/IntrinsicsRISCV.td index 015585780e58..df289d9714f7 100644 --- a/llvm/include/llvm/IR/IntrinsicsRISCV.td +++ b/llvm/include/llvm/IR/IntrinsicsRISCV.td @@ -423,6 +423,8 @@ let TargetPrefix = "riscv" in { defm vfdiv : RISCVBinaryAAX; defm vfrdiv : RISCVBinaryAAX; + defm vfwmul : RISCVBinaryABX; + defm vfsgnj : RISCVBinaryAAX; defm vfsgnjn : RISCVBinaryAAX; defm vfsgnjx : RISCVBinaryAAX; diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td index 150cf58b0339..52c4211a5855 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td @@ -1543,6 +1543,11 @@ defm PseudoVFMUL : VPseudoBinaryV_VV_VX; defm PseudoVFDIV : VPseudoBinaryV_VV_VX; defm PseudoVFRDIV : VPseudoBinaryV_VX; +//===--===// +// 14.5. Vector Widening Floating-Point Multiply +//===--===// +defm PseudoVFWMUL : VPseudoBinaryW_VV_VX; + //===--===// // 14.12. Vector Floating-Point Sign-Injection Instructions //===--===// @@ -1829,6 +1834,11 @@ defm "" : VPatBinaryV_VV_VX<"int_riscv_vfmul", "PseudoVFMUL", AllFloatVectors>; defm "" : VPatBinaryV_VV_VX<"int_riscv_vfdiv", "PseudoVFDIV", AllFloatVectors>; defm "" : VPatBinaryV_VX<"int_riscv_vfrdiv", "PseudoVFRDIV", AllFloatVectors>; +//===--===// +// 14.5. Vector Widening Floating-Point Multiply +//===--===// +defm "" : VPatBinaryW_VV_VX<"int_riscv_vfwmul", "PseudoVFWMUL", AllWidenableFloatVectors>; + //===--===// // 14.12. Vector Floating-Point Sign-Injection Instructions //===--===// diff --git a/llvm/test/CodeGen/RISCV/rvv/vfwmul-rv32.ll b/llvm/test/CodeGen/RISCV/rvv/vfwmul-rv32.ll new file mode 100644 index ..80448534d1c1 --- /dev/null +++ b/llvm/test/CodeGen/RISCV/rvv/vfwmul-rv32.ll @@ -0,0 +1,401 @@ +; RUN: llc -mtriple=riscv32 -mattr=+experimental-v,+f,+experimental-zfh -verify-machineinstrs \ +; RUN: --riscv-no-aliases < %s | FileCheck %s +declare @llvm.riscv.vfwmul.nxv1f16( + , + , + i32); + +define @intrinsic_vfwmul_vv_nxv1f16_nxv1f16( %0, %1, i32 %2) nounwind { +entry: +; CHECK-LABEL: intrinsic_vfwmul_vv_nxv1f16_nxv1f16 +; CHECK: vsetvli {{.*}}, {{a[0-9]+}}, e16,mf4,ta,mu +; CHECK: vfwmul.vv {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}} + %a = call @llvm.riscv.vfwmul.nxv1f16( + %0, + %1, +i32 %2) + + ret %a +} + +declare @llvm.riscv.vfwmul.mask.nxv1f16( + , + , + , + , + i32); + +define @intrinsic_vfwmul_mask_vv_nxv1f16_nxv1f16( %0, %1, %2, %3, i32 %4) nounwind { +entry: +; CHECK-LABEL: intrinsic_vfwmul_mask_vv_nxv1f16_nxv1f16 +; CHECK: vsetvli {{.*}}, {{a[0-9]+}}, e16,mf4,ta,mu +; CHECK: vfwmul.vv {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}, v0.t + %a = call @llvm.riscv.vfwmul.mask.nxv1f16( + %0, + %1, + %2, + %3, +i32 %4) + + ret %a +} + +declare @llvm.riscv.vfwmul.nxv2f16( + , + , + i32); + +define @intrinsic_vfwmul_vv_nxv2f16_nxv2f16( %0, %1, i32 %2) nounwind { +entry: +; CHECK-LABEL: intrinsic_vfwmul_vv_nxv2f16_nxv2f16 +; CHECK: vsetvli {{.*}}, {{a[0-9]+}}, e16,mf2,ta,mu +; CHECK: vfwmul.vv {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}} + %a = call @llvm.riscv.vfwmul.nxv2f16( + %0, + %1, +i32 %2) + + ret %a +} + +declare @llvm.riscv.vfwmul.mask.nxv2f16( + , + , + , + , + i32); + +define @intrinsic_vfwmul_mask_vv_nxv2f16_nxv2f16( %0, %1, %2, %3, i32 %4) nounwind { +entry: +; CHECK-LABEL: intrinsic_vfwmul_ma
[llvm-branch-commits] [clang] 432d051 - [RISCV] Handle zfh in the arch string.
Author: Hsiangkai Wang Date: 2020-12-03T09:16:44+08:00 New Revision: 432d05174ed00a217c0ad37e2e823154624c1311 URL: https://github.com/llvm/llvm-project/commit/432d05174ed00a217c0ad37e2e823154624c1311 DIFF: https://github.com/llvm/llvm-project/commit/432d05174ed00a217c0ad37e2e823154624c1311.diff LOG: [RISCV] Handle zfh in the arch string. Differential Revision: https://reviews.llvm.org/D91315 Added: Modified: clang/lib/Basic/Targets/RISCV.cpp clang/lib/Basic/Targets/RISCV.h clang/lib/Driver/ToolChains/Arch/RISCV.cpp clang/test/Driver/riscv-arch.c clang/test/Preprocessor/riscv-target-features.c Removed: diff --git a/clang/lib/Basic/Targets/RISCV.cpp b/clang/lib/Basic/Targets/RISCV.cpp index 37e688d14b4a..2b076c9c16f2 100644 --- a/clang/lib/Basic/Targets/RISCV.cpp +++ b/clang/lib/Basic/Targets/RISCV.cpp @@ -135,6 +135,9 @@ void RISCVTargetInfo::getTargetDefines(const LangOptions &Opts, if (HasB) Builder.defineMacro("__riscv_bitmanip"); + + if (HasZfh) +Builder.defineMacro("__riscv_zfh"); } /// Return true if has this feature, need to sync with handleTargetFeatures. @@ -150,6 +153,7 @@ bool RISCVTargetInfo::hasFeature(StringRef Feature) const { .Case("d", HasD) .Case("c", HasC) .Case("experimental-b", HasB) + .Case("experimental-zfh", HasZfh) .Default(false); } @@ -169,6 +173,8 @@ bool RISCVTargetInfo::handleTargetFeatures(std::vector &Features, HasC = true; else if (Feature == "+experimental-b") HasB = true; +else if (Feature == "+experimental-zfh") + HasZfh = true; } return true; diff --git a/clang/lib/Basic/Targets/RISCV.h b/clang/lib/Basic/Targets/RISCV.h index a4e6777a11e2..20a7b1c73175 100644 --- a/clang/lib/Basic/Targets/RISCV.h +++ b/clang/lib/Basic/Targets/RISCV.h @@ -31,11 +31,12 @@ class RISCVTargetInfo : public TargetInfo { bool HasD; bool HasC; bool HasB; + bool HasZfh; public: RISCVTargetInfo(const llvm::Triple &Triple, const TargetOptions &) - : TargetInfo(Triple), HasM(false), HasA(false), HasF(false), -HasD(false), HasC(false), HasB(false) { + : TargetInfo(Triple), HasM(false), HasA(false), HasF(false), HasD(false), +HasC(false), HasB(false), HasZfh(false) { LongDoubleWidth = 128; LongDoubleAlign = 128; LongDoubleFormat = &llvm::APFloat::IEEEquad(); diff --git a/clang/lib/Driver/ToolChains/Arch/RISCV.cpp b/clang/lib/Driver/ToolChains/Arch/RISCV.cpp index 7ca05a1f3a39..aa1a5d8c803f 100644 --- a/clang/lib/Driver/ToolChains/Arch/RISCV.cpp +++ b/clang/lib/Driver/ToolChains/Arch/RISCV.cpp @@ -64,6 +64,8 @@ isExperimentalExtension(StringRef Ext) { return RISCVExtensionVersion{"0", "92"}; if (Ext == "v") return RISCVExtensionVersion{"0", "9"}; + if (Ext == "zfh") +return RISCVExtensionVersion{"0", "1"}; return None; } diff --git a/clang/test/Driver/riscv-arch.c b/clang/test/Driver/riscv-arch.c index 8b630b1846c9..533f1cff42af 100644 --- a/clang/test/Driver/riscv-arch.c +++ b/clang/test/Driver/riscv-arch.c @@ -383,3 +383,12 @@ // RUN: %clang -target riscv32-unknown-elf -march=rv32iv0p9 -menable-experimental-extensions -### %s -c 2>&1 | \ // RUN: FileCheck -check-prefix=RV32-EXPERIMENTAL-V-GOODVERS %s // RV32-EXPERIMENTAL-V-GOODVERS: "-target-feature" "+experimental-v" + +// RUN: %clang -target riscv32-unknown-elf -march=rv32izfh -### %s \ +// RUN: -fsyntax-only 2>&1 | FileCheck -check-prefix=RV32-EXPERIMENTAL-ZFH-NOFLAG %s +// RV32-EXPERIMENTAL-ZFH-NOFLAG: error: invalid arch name 'rv32izfh' +// RV32-EXPERIMENTAL-ZFH-NOFLAG: requires '-menable-experimental-extensions' + +// RUN: %clang -target riscv32-unknown-elf -march=rv32izfh0p1 -menable-experimental-extensions -### %s \ +// RUN: -fsyntax-only 2>&1 | FileCheck -check-prefix=RV32-EXPERIMENTAL-ZFH %s +// RV32-EXPERIMENTAL-ZFH: "-target-feature" "+experimental-zfh" diff --git a/clang/test/Preprocessor/riscv-target-features.c b/clang/test/Preprocessor/riscv-target-features.c index d8c18f76e53b..c0ffd83bc7e2 100644 --- a/clang/test/Preprocessor/riscv-target-features.c +++ b/clang/test/Preprocessor/riscv-target-features.c @@ -78,3 +78,9 @@ // CHECK-DOUBLE: __riscv_float_abi_double 1 // CHECK-DOUBLE-NOT: __riscv_float_abi_soft // CHECK-DOUBLE-NOT: __riscv_float_abi_single + +// RUN: %clang -target riscv32-unknown-linux-gnu -menable-experimental-extensions -march=rv32izfh0p1 -x c -E -dM %s \ +// RUN: -o - | FileCheck --check-prefix=CHECK-ZFH-EXT %s +// RUN: %clang -target riscv64-unknown-linux-gnu -menable-experimental-extensions -march=rv64izfh0p1 -x c -E -dM %s \ +// RUN: -o - | FileCheck --check-prefix=CHECK-ZFH-EXT %s +// CHECK-ZFH-EXT: __riscv_zfh 1 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-
[llvm-branch-commits] [clang] 5e953a2 - [RISCV] Define preprocessor definitions for 'V' extension.
Author: Hsiangkai Wang Date: 2020-12-05T08:34:32+08:00 New Revision: 5e953a274b2ada5bfa54b3d765e391abb03f474f URL: https://github.com/llvm/llvm-project/commit/5e953a274b2ada5bfa54b3d765e391abb03f474f DIFF: https://github.com/llvm/llvm-project/commit/5e953a274b2ada5bfa54b3d765e391abb03f474f.diff LOG: [RISCV] Define preprocessor definitions for 'V' extension. Differential Revision: https://reviews.llvm.org/D92650 Added: Modified: clang/lib/Basic/Targets/RISCV.cpp clang/lib/Basic/Targets/RISCV.h clang/test/Preprocessor/riscv-target-features.c Removed: diff --git a/clang/lib/Basic/Targets/RISCV.cpp b/clang/lib/Basic/Targets/RISCV.cpp index 2b076c9c16f2..4436db904d59 100644 --- a/clang/lib/Basic/Targets/RISCV.cpp +++ b/clang/lib/Basic/Targets/RISCV.cpp @@ -136,6 +136,9 @@ void RISCVTargetInfo::getTargetDefines(const LangOptions &Opts, if (HasB) Builder.defineMacro("__riscv_bitmanip"); + if (HasV) +Builder.defineMacro("__riscv_vector"); + if (HasZfh) Builder.defineMacro("__riscv_zfh"); } @@ -153,6 +156,7 @@ bool RISCVTargetInfo::hasFeature(StringRef Feature) const { .Case("d", HasD) .Case("c", HasC) .Case("experimental-b", HasB) + .Case("experimental-v", HasV) .Case("experimental-zfh", HasZfh) .Default(false); } @@ -173,6 +177,8 @@ bool RISCVTargetInfo::handleTargetFeatures(std::vector &Features, HasC = true; else if (Feature == "+experimental-b") HasB = true; +else if (Feature == "+experimental-v") + HasV = true; else if (Feature == "+experimental-zfh") HasZfh = true; } diff --git a/clang/lib/Basic/Targets/RISCV.h b/clang/lib/Basic/Targets/RISCV.h index 20a7b1c73175..8430407b041e 100644 --- a/clang/lib/Basic/Targets/RISCV.h +++ b/clang/lib/Basic/Targets/RISCV.h @@ -31,12 +31,13 @@ class RISCVTargetInfo : public TargetInfo { bool HasD; bool HasC; bool HasB; + bool HasV; bool HasZfh; public: RISCVTargetInfo(const llvm::Triple &Triple, const TargetOptions &) : TargetInfo(Triple), HasM(false), HasA(false), HasF(false), HasD(false), -HasC(false), HasB(false), HasZfh(false) { +HasC(false), HasB(false), HasV(false), HasZfh(false) { LongDoubleWidth = 128; LongDoubleAlign = 128; LongDoubleFormat = &llvm::APFloat::IEEEquad(); diff --git a/clang/test/Preprocessor/riscv-target-features.c b/clang/test/Preprocessor/riscv-target-features.c index c0ffd83bc7e2..d60e7039a92f 100644 --- a/clang/test/Preprocessor/riscv-target-features.c +++ b/clang/test/Preprocessor/riscv-target-features.c @@ -79,6 +79,14 @@ // CHECK-DOUBLE-NOT: __riscv_float_abi_soft // CHECK-DOUBLE-NOT: __riscv_float_abi_single +// RUN: %clang -target riscv32-unknown-linux-gnu -menable-experimental-extensions \ +// RUN: -march=rv32iv0p9 -x c -E -dM %s \ +// RUN: -o - | FileCheck --check-prefix=CHECK-V-EXT %s +// RUN: %clang -target riscv64-unknown-linux-gnu -menable-experimental-extensions \ +// RUN: -march=rv64iv0p9 -x c -E -dM %s \ +// RUN: -o - | FileCheck --check-prefix=CHECK-V-EXT %s +// CHECK-V-EXT: __riscv_vector 1 +// // RUN: %clang -target riscv32-unknown-linux-gnu -menable-experimental-extensions -march=rv32izfh0p1 -x c -E -dM %s \ // RUN: -o - | FileCheck --check-prefix=CHECK-ZFH-EXT %s // RUN: %clang -target riscv64-unknown-linux-gnu -menable-experimental-extensions -march=rv64izfh0p1 -x c -E -dM %s \ ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 5aa584e - [RISCV] Separate masked and unmasked definitions for pseudo instructions.
Author: Hsiangkai Wang Date: 2020-12-11T14:02:56+08:00 New Revision: 5aa584ec713c6aefc34ecc997d98c5f05210fa07 URL: https://github.com/llvm/llvm-project/commit/5aa584ec713c6aefc34ecc997d98c5f05210fa07 DIFF: https://github.com/llvm/llvm-project/commit/5aa584ec713c6aefc34ecc997d98c5f05210fa07.diff LOG: [RISCV] Separate masked and unmasked definitions for pseudo instructions. Differential Revision: https://reviews.llvm.org/D93012 Added: Modified: llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td llvm/lib/Target/RISCV/RISCVMCInstLower.cpp llvm/test/CodeGen/RISCV/rvv/add-vsetvli-gpr.mir llvm/test/CodeGen/RISCV/rvv/add-vsetvli-vlmax.ll Removed: diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td index a0bcea883118..32762fd2803e 100644 --- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td +++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td @@ -126,36 +126,88 @@ def RISCVVPseudosTable : GenericTable { // Helpers to define the diff erent pseudo instructions. //===--===// -multiclass pseudo_binary { - let Constraints = "$rd = $merge", - Uses = [VL, VTYPE], VLIndex = 5, SEWIndex = 6, MergeOpIndex = 1, - BaseInstr = !cast(!subst("Pseudo", "", NAME)) in -def "_"# vlmul.MX : Pseudo<(outs result_reg_class:$rd), -(ins result_reg_class:$merge, - op1_reg_class:$rs2, op2_kind:$rs1, - VMaskOp:$vm, GPR:$vl, ixlenimm:$sew), -[]>, - RISCVVPseudo; +class PseudoToVInst { + string VInst = !subst("_M8", "", + !subst("_M4", "", + !subst("_M2", "", + !subst("_M1", "", + !subst("_MF2", "", + !subst("_MF4", "", + !subst("_MF8", "", + !subst("_MASK", "", + !subst("Pseudo", "", PseudoInst); } -multiclass pseudo_binary_v_vv_vx_vi { +class VPseudoBinary : +Pseudo<(outs RetClass:$rd), + (ins Op1Class:$rs2, Op2Class:$rs1, GPR:$vl, ixlenimm:$sew), []>, +RISCVVPseudo { + let Uses = [VL, VTYPE]; + let VLIndex = 3; + let SEWIndex = 4; + let MergeOpIndex = -1; + let BaseInstr = !cast(PseudoToVInst.VInst); +} + +class VPseudoBinaryMask : +Pseudo<(outs RetClass:$rd), +(ins RetClass:$merge, + Op1Class:$rs2, Op2Class:$rs1, + VMaskOp:$vm, GPR:$vl, ixlenimm:$sew), []>, +RISCVVPseudo { + let Constraints = "$rd = $merge"; + let Uses = [VL, VTYPE]; + let VLIndex = 5; + let SEWIndex = 6; + let MergeOpIndex = 1; + let BaseInstr = !cast(PseudoToVInst.VInst); +} + +multiclass VPseudoBinary { + def "_" # MInfo.MX : VPseudoBinary; + def "_" # MInfo.MX # "_MASK" : VPseudoBinaryMask; +} + +multiclass VPseudoBinaryV_VV { let mayLoad = 0, mayStore = 0, hasSideEffects = 0, usesCustomInserter = 1 in foreach m = MxList.m in { let VLMul = m.value in -{ - defvar evr = m.vrclass; - defm _VV : pseudo_binary; - defm _VX : pseudo_binary; - defm _VI : pseudo_binary; -} +defm _VV : VPseudoBinary; + } +} + +multiclass VPseudoBinaryV_VX { + let mayLoad = 0, mayStore = 0, hasSideEffects = 0, usesCustomInserter = 1 in + foreach m = MxList.m in + { +let VLMul = m.value in +defm _VX : VPseudoBinary; + } +} + +multiclass VPseudoBinaryV_VI { + let mayLoad = 0, mayStore = 0, hasSideEffects = 0, usesCustomInserter = 1 in + foreach m = MxList.m in + { +let VLMul = m.value in +defm _VI : VPseudoBinary; } } +multiclass VPseudoBinary_VV_VX_VI { + defm "" : VPseudoBinaryV_VV; + defm "" : VPseudoBinaryV_VX; + defm "" : VPseudoBinaryV_VI; +} + //===--===// // Helpers to define the diff erent patterns. //===--===// @@ -167,7 +219,7 @@ multiclass pat_vop_binary { @@ -175,10 +227,8 @@ multiclass pat_vop_binary; } @@ -300,7 +350,7 @@ foreach vti = AllVectors in //===--===// // Pseudo instructions. -defm PseudoVADD: pseudo_binary_v_vv_vx_vi; +defm PseudoVADD: VPseudoBinary_VV_VX_VI; // Whole-register vector patterns. defm "" : pat_vop_binary_common; diff --git a/llvm/lib/Target/RISCV/RISCVMCInstLower.cpp b/llvm/lib/Target/RISCV/RISCVMCInstLower.cpp index 876d557ec79d..b1aacfe878b9 100644 --- a/llvm/lib/Target/RISCV/RISCVMCInstLower.cpp +++ b/llvm/lib/Target/RISCV/RISCVMCInstLower.cpp @@ -176,6 +176,12 @@ static bool lowerRISCVVMachineInstrToMCInst(const MachineInstr *MI, } OutMI.addOperand(MCOp); } + + // Unmasked ps