[PATCH] D49674: [AArch64] Add Tiny Code Model for AArch64

2018-08-22 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Thanks


https://reviews.llvm.org/D49674



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D49674: [AArch64] Add Tiny Code Model for AArch64

2018-08-22 Thread Dave Green via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rC340398: [AArch64] Add Tiny Code Model for AArch64 (authored 
by dmgreen, committed by ).

Repository:
  rC Clang

https://reviews.llvm.org/D49674

Files:
  include/clang/Driver/CC1Options.td
  lib/CodeGen/BackendUtil.cpp
  lib/Frontend/CompilerInvocation.cpp
  test/Driver/code-model.c


Index: include/clang/Driver/CC1Options.td
===
--- include/clang/Driver/CC1Options.td
+++ include/clang/Driver/CC1Options.td
@@ -249,7 +249,7 @@
 def masm_verbose : Flag<["-"], "masm-verbose">,
   HelpText<"Generate verbose assembly output">;
 def mcode_model : Separate<["-"], "mcode-model">,
-  HelpText<"The code model to use">, Values<"small,kernel,medium,large">;
+  HelpText<"The code model to use">, Values<"tiny,small,kernel,medium,large">;
 def mdebug_pass : Separate<["-"], "mdebug-pass">,
   HelpText<"Enable additional debug output">;
 def mdisable_fp_elim : Flag<["-"], "mdisable-fp-elim">,
Index: test/Driver/code-model.c
===
--- test/Driver/code-model.c
+++ test/Driver/code-model.c
@@ -1,9 +1,11 @@
+// RUN: %clang -### -c -mcmodel=tiny %s 2>&1 | FileCheck -check-prefix 
CHECK-TINY %s
 // RUN: %clang -### -c -mcmodel=small %s 2>&1 | FileCheck -check-prefix 
CHECK-SMALL %s
 // RUN: %clang -### -S -mcmodel=kernel %s 2>&1 | FileCheck -check-prefix 
CHECK-KERNEL %s
 // RUN: %clang -### -c -mcmodel=medium %s 2>&1 | FileCheck -check-prefix 
CHECK-MEDIUM %s
 // RUN: %clang -### -S -mcmodel=large %s 2>&1 | FileCheck -check-prefix 
CHECK-LARGE %s
 // RUN: not %clang -c -mcmodel=lager %s 2>&1 | FileCheck -check-prefix 
CHECK-INVALID %s
 
+// CHECK-TINY: "-mcode-model" "tiny"
 // CHECK-SMALL: "-mcode-model" "small"
 // CHECK-KERNEL: "-mcode-model" "kernel"
 // CHECK-MEDIUM: "-mcode-model" "medium"
Index: lib/CodeGen/BackendUtil.cpp
===
--- lib/CodeGen/BackendUtil.cpp
+++ lib/CodeGen/BackendUtil.cpp
@@ -368,6 +368,7 @@
 static Optional
 getCodeModel(const CodeGenOptions &CodeGenOpts) {
   unsigned CodeModel = llvm::StringSwitch(CodeGenOpts.CodeModel)
+   .Case("tiny", llvm::CodeModel::Tiny)
.Case("small", llvm::CodeModel::Small)
.Case("kernel", llvm::CodeModel::Kernel)
.Case("medium", llvm::CodeModel::Medium)
Index: lib/Frontend/CompilerInvocation.cpp
===
--- lib/Frontend/CompilerInvocation.cpp
+++ lib/Frontend/CompilerInvocation.cpp
@@ -370,7 +370,7 @@
   if (Arg *A = Args.getLastArg(OPT_mcode_model)) {
 StringRef Value = A->getValue();
 if (Value == "small" || Value == "kernel" || Value == "medium" ||
-Value == "large")
+Value == "large" || Value == "tiny")
   return Value;
 Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Value;
   }


Index: include/clang/Driver/CC1Options.td
===
--- include/clang/Driver/CC1Options.td
+++ include/clang/Driver/CC1Options.td
@@ -249,7 +249,7 @@
 def masm_verbose : Flag<["-"], "masm-verbose">,
   HelpText<"Generate verbose assembly output">;
 def mcode_model : Separate<["-"], "mcode-model">,
-  HelpText<"The code model to use">, Values<"small,kernel,medium,large">;
+  HelpText<"The code model to use">, Values<"tiny,small,kernel,medium,large">;
 def mdebug_pass : Separate<["-"], "mdebug-pass">,
   HelpText<"Enable additional debug output">;
 def mdisable_fp_elim : Flag<["-"], "mdisable-fp-elim">,
Index: test/Driver/code-model.c
===
--- test/Driver/code-model.c
+++ test/Driver/code-model.c
@@ -1,9 +1,11 @@
+// RUN: %clang -### -c -mcmodel=tiny %s 2>&1 | FileCheck -check-prefix CHECK-TINY %s
 // RUN: %clang -### -c -mcmodel=small %s 2>&1 | FileCheck -check-prefix CHECK-SMALL %s
 // RUN: %clang -### -S -mcmodel=kernel %s 2>&1 | FileCheck -check-prefix CHECK-KERNEL %s
 // RUN: %clang -### -c -mcmodel=medium %s 2>&1 | FileCheck -check-prefix CHECK-MEDIUM %s
 // RUN: %clang -### -S -mcmodel=large %s 2>&1 | FileCheck -check-prefix CHECK-LARGE %s
 // RUN: not %clang -c -mcmodel=lager %s 2>&1 | FileCheck -check-prefix CHECK-INVALID %s
 
+// CHECK-TINY: "-mcode-model" "tiny"
 // CHECK-SMALL: "-mcode-model" "small"
 // CHECK-KERNEL: "-mcode-model" "kernel"
 // CHECK-MEDIUM: "-mcode-model" "medium"
Index: lib/CodeGen/BackendUtil.cpp
===
--- lib/CodeGen/BackendUtil.cpp
+++ lib/CodeGen/BackendUtil.cpp
@@ -368,6 +368,7 @@
 static Optional
 getCodeModel(const CodeGenOptions &CodeGenOpts) {
   unsigned CodeModel = llvm::StringSwitch(CodeGenOpts.CodeModel)
+   .Case("tiny", llvm::CodeModel::Tiny)
   

[PATCH] D47267: [UnrollAndJam] Add unroll_and_jam pragma handling

2018-05-23 Thread Dave Green via Phabricator via cfe-commits
dmgreen created this revision.
dmgreen added reviewers: SjoerdMeijer, hfinkel, tyler.nowicki, anemet, alexfh.
Herald added a subscriber: zzheng.

This adds support for the unroll_and_jam pragma, to go with 
https://reviews.llvm.org/D41953. The name of
the pragma is copied from the Intel compiler, and most of the code works the 
same as
for unroll.

I have some doubts whether this will ever be used sensibly in the real world, 
but can
be useful for testing and was not difficult to put together.


https://reviews.llvm.org/D47267

Files:
  include/clang/Basic/Attr.td
  include/clang/Basic/DiagnosticParseKinds.td
  include/clang/Parse/Parser.h
  lib/CodeGen/CGLoopInfo.cpp
  lib/CodeGen/CGLoopInfo.h
  lib/Parse/ParsePragma.cpp
  lib/Sema/SemaStmtAttr.cpp
  test/CodeGenCXX/pragma-unroll-and-jam.cpp
  test/Parser/pragma-loop.cpp
  test/Parser/pragma-unroll-and-jam.cpp

Index: test/Parser/pragma-unroll-and-jam.cpp
===
--- /dev/null
+++ test/Parser/pragma-unroll-and-jam.cpp
@@ -0,0 +1,78 @@
+// RUN: %clang_cc1 -std=c++11 -verify %s
+
+// Note that this puts the expected lines before the directives to work around
+// limitations in the -verify mode.
+
+void test(int *List, int Length, int Value) {
+  int i = 0;
+
+#pragma unroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma nounroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma unroll_and_jam 4
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+/* expected-error {{expected ')'}} */ #pragma unroll_and_jam(4
+/* expected-error {{missing argument; expected an integer value}} */ #pragma unroll_and_jam()
+/* expected-warning {{extra tokens at end of '#pragma unroll_and_jam'}} */ #pragma unroll_and_jam 1 2
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+/* expected-warning {{extra tokens at end of '#pragma nounroll_and_jam'}} */ #pragma nounroll_and_jam 1
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma unroll_and_jam
+/* expected-error {{expected a for, while, or do-while loop to follow '#pragma unroll_and_jam'}} */ int j = Length;
+#pragma unroll_and_jam 4
+/* expected-error {{expected a for, while, or do-while loop to follow '#pragma unroll_and_jam'}} */ int k = Length;
+#pragma nounroll_and_jam
+/* expected-error {{expected a for, while, or do-while loop to follow '#pragma nounroll_and_jam'}} */ int l = Length;
+
+/* expected-error {{incompatible directives 'unroll_and_jam(disable)' and '#pragma unroll_and_jam(4)'}} */ #pragma unroll_and_jam 4
+#pragma clang loop unroll_and_jam(disable)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+/* expected-error {{incompatible directives 'unroll_and_jam(full)' and '#pragma unroll_and_jam(4)'}} */ #pragma unroll_and_jam(4)
+#pragma clang loop unroll_and_jam(full)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma nounroll_and_jam
+#pragma unroll(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma unroll_and_jam
+/* expected-error {{expected statement}} */ }
Index: test/Parser/pragma-loop.cpp
===
--- test/Parser/pragma-loop.cpp
+++ test/Parser/pragma-loop.cpp
@@ -147,7 +147,7 @@
 /* expected-error {{missing argument; expected 'enable', 'full' or 'disable'}} */ #pragma clang loop unroll()
 /* expected-error {{missing argument; expected 'enable' or 'disable'}} */ #pragma clang loop distribute()
 
-/* expected-error {{missing option; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, or distribute}} */ #pragma clang loop
+/* expected-error {{missing option; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, unroll_and_jam, unroll_and_jam_count, or distribute}} */ #pragma clang loop
 /* expected-error {{invalid option 'badkeyword'}} */ #pragma clang loop badkeyword
 /* expected-error {{invalid option 'badkeyword'}} */ #pragma clang loop badkeyword(enable)
 /* expected-error {{invalid option 'badkeyword'}} */ #pragma clang loop vectorize(enable) badkeyword(4)
Index: test/CodeGenCXX/pragma-unroll-and-jam.cpp
===
--- /dev/null
+++ test/CodeGenCXX/pragma-unroll-and-jam.cpp
@@ -0,0 +1,91 @@
+// RUN: %clang_cc1 -triple arm-none-eabi -std=c++11 -emit-llvm -o - %s | FileCheck %s
+
+voi

[PATCH] D47267: [UnrollAndJam] Add unroll_and_jam pragma handling

2018-05-23 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

> In my experience, they are used.

Good to know, cheers.

> Could we maybe disable the #pragma clang loop unroll_and_jam spelling ftm to 
> avoid compatibility issues?

Sure, I'm not against. It sounds like you have opinions on how this should 
work. That's good. If there are multiple clang loop pragma's, what is the 
expected behaviour there?

In the llvm side of this, in the unroll and jam pass, I made it so that a loop 
with "llvm.loop.unroll" metadata without any "llvm.loop.unroll_and_jam" 
metadata will not do unroll and jam, it will leave the loop for the unroller. 
On the expectation that the use really wants to unroll (and it applies to 
llvm.loop.unroll.disable too, disabling unroll and jam as well as unroll). I 
haven't done anything with other loop metadata though.


https://reviews.llvm.org/D47267



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47320: [UnrollAndJam] Add pragma clang loop unroll_and_jam handling.

2018-05-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen created this revision.
Herald added a subscriber: zzheng.

Split out from https://reviews.llvm.org/D47267 when we need it.


https://reviews.llvm.org/D47320

Files:
  include/clang/Basic/DiagnosticParseKinds.td
  lib/Parse/ParsePragma.cpp
  lib/Sema/SemaStmtAttr.cpp
  test/CodeGenCXX/pragma-unroll-and-jam.cpp
  test/Parser/pragma-loop.cpp
  test/Parser/pragma-unroll-and-jam.cpp

Index: test/Parser/pragma-unroll-and-jam.cpp
===
--- test/Parser/pragma-unroll-and-jam.cpp
+++ test/Parser/pragma-unroll-and-jam.cpp
@@ -50,24 +50,24 @@
 #pragma nounroll_and_jam
 /* expected-error {{expected a for, while, or do-while loop to follow '#pragma nounroll_and_jam'}} */ int l = Length;
 
-/* expected-error {{incompatible directives '#pragma nounroll_and_jam' and '#pragma unroll_and_jam(4)'}} */ #pragma unroll_and_jam 4
-#pragma nounroll_and_jam
+/* expected-error {{incompatible directives 'unroll_and_jam(disable)' and '#pragma unroll_and_jam(4)'}} */ #pragma unroll_and_jam 4
+#pragma clang loop unroll_and_jam(disable)
   for (int i = 0; i < Length; i++) {
 for (int j = 0; j < Length; j++) {
   List[i * Length + j] = Value;
 }
   }
 
-#pragma nounroll_and_jam
-#pragma unroll(4)
+/* expected-error {{incompatible directives 'unroll_and_jam(full)' and '#pragma unroll_and_jam(4)'}} */ #pragma unroll_and_jam(4)
+#pragma clang loop unroll_and_jam(full)
   for (int i = 0; i < Length; i++) {
 for (int j = 0; j < Length; j++) {
   List[i * Length + j] = Value;
 }
   }
 
-// pragma clang unroll_and_jam is disabled for the moment
-/* expected-error {{invalid option 'unroll_and_jam'; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, or distribute}} */ #pragma clang loop unroll_and_jam(4)
+#pragma nounroll_and_jam
+#pragma unroll(4)
   for (int i = 0; i < Length; i++) {
 for (int j = 0; j < Length; j++) {
   List[i * Length + j] = Value;
Index: test/Parser/pragma-loop.cpp
===
--- test/Parser/pragma-loop.cpp
+++ test/Parser/pragma-loop.cpp
@@ -147,7 +147,7 @@
 /* expected-error {{missing argument; expected 'enable', 'full' or 'disable'}} */ #pragma clang loop unroll()
 /* expected-error {{missing argument; expected 'enable' or 'disable'}} */ #pragma clang loop distribute()
 
-/* expected-error {{missing option; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, or distribute}} */ #pragma clang loop
+/* expected-error {{missing option; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, unroll_and_jam, unroll_and_jam_count, or distribute}} */ #pragma clang loop
 /* expected-error {{invalid option 'badkeyword'}} */ #pragma clang loop badkeyword
 /* expected-error {{invalid option 'badkeyword'}} */ #pragma clang loop badkeyword(enable)
 /* expected-error {{invalid option 'badkeyword'}} */ #pragma clang loop vectorize(enable) badkeyword(4)
Index: test/CodeGenCXX/pragma-unroll-and-jam.cpp
===
--- test/CodeGenCXX/pragma-unroll-and-jam.cpp
+++ test/CodeGenCXX/pragma-unroll-and-jam.cpp
@@ -33,6 +33,39 @@
   }
 }
 
+void clang_unroll_and_jam(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z20clang_unroll_and_jam
+#pragma clang loop unroll_and_jam(enable)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_4:.*]]
+  List[i * Length + j] = Value;
+}
+  }
+}
+
+void clang_unroll_and_jam_count(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z26clang_unroll_and_jam_count
+#pragma clang loop unroll_and_jam_count(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_5:.*]]
+  List[i * Length + j] = Value;
+}
+  }
+}
+
+void clang_nounroll_and_jam(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z22clang_nounroll_and_jam
+#pragma clang loop unroll_and_jam(disable)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_6:.*]]
+  List[i * Length + j] = Value;
+}
+  }
+}
+
 void clang_unroll_plus_nounroll_and_jam(int *List, int Length, int Value) {
   // CHECK: define {{.*}} @_Z34clang_unroll_plus_nounroll_and_jam
 #pragma nounroll_and_jam
@@ -51,5 +84,8 @@
 // CHECK: ![[UNJ_4]] = !{!"llvm.loop.unroll_and_jam.count", i32 4}
 // CHECK: ![[LOOP_3]] = distinct !{![[LOOP_3]], ![[UNJ_DISABLE:.*]]}
 // CHECK: ![[UNJ_DISABLE]] = !{!"llvm.loop.unroll_and_jam.disable"}
+// CHECK: ![[LOOP_4]] = distinct !{![[LOOP_4]], ![[UNJ_ENABLE:.*]]}
+// CHECK: ![[LOOP_5]] = distinct !{![[LOOP_5]], ![[UNJ_4:.*]]}
+// CHECK: ![[LOOP_6]] = distinct !{![[LOOP_6]], ![[UNJ_DISABLE:.*]]}
 // CHECK: ![[LOOP_7]] = distinct !{![[LOOP_7]], ![[UN

[PATCH] D47267: [UnrollAndJam] Add unroll_and_jam pragma handling

2018-05-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen updated this revision to Diff 148393.
dmgreen added a comment.

This splits out the pragma clang loop unroll_and_jam handling into 
https://reviews.llvm.org/D47320, for if/when we need it. Which I believe is 
what you wanted, correct me if I'm wrong.


https://reviews.llvm.org/D47267

Files:
  include/clang/Basic/Attr.td
  include/clang/Parse/Parser.h
  lib/CodeGen/CGLoopInfo.cpp
  lib/CodeGen/CGLoopInfo.h
  lib/Parse/ParsePragma.cpp
  lib/Sema/SemaStmtAttr.cpp
  test/CodeGenCXX/pragma-unroll-and-jam.cpp
  test/Parser/pragma-unroll-and-jam.cpp

Index: test/Parser/pragma-unroll-and-jam.cpp
===
--- /dev/null
+++ test/Parser/pragma-unroll-and-jam.cpp
@@ -0,0 +1,78 @@
+// RUN: %clang_cc1 -std=c++11 -verify %s
+
+// Note that this puts the expected lines before the directives to work around
+// limitations in the -verify mode.
+
+void test(int *List, int Length, int Value) {
+  int i = 0;
+
+#pragma unroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma nounroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma unroll_and_jam 4
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+/* expected-error {{expected ')'}} */ #pragma unroll_and_jam(4
+/* expected-error {{missing argument; expected an integer value}} */ #pragma unroll_and_jam()
+/* expected-warning {{extra tokens at end of '#pragma unroll_and_jam'}} */ #pragma unroll_and_jam 1 2
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+/* expected-warning {{extra tokens at end of '#pragma nounroll_and_jam'}} */ #pragma nounroll_and_jam 1
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma unroll_and_jam
+/* expected-error {{expected a for, while, or do-while loop to follow '#pragma unroll_and_jam'}} */ int j = Length;
+#pragma unroll_and_jam 4
+/* expected-error {{expected a for, while, or do-while loop to follow '#pragma unroll_and_jam'}} */ int k = Length;
+#pragma nounroll_and_jam
+/* expected-error {{expected a for, while, or do-while loop to follow '#pragma nounroll_and_jam'}} */ int l = Length;
+
+/* expected-error {{incompatible directives '#pragma nounroll_and_jam' and '#pragma unroll_and_jam(4)'}} */ #pragma unroll_and_jam 4
+#pragma nounroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma nounroll_and_jam
+#pragma unroll(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+// pragma clang unroll_and_jam is disabled for the moment
+/* expected-error {{invalid option 'unroll_and_jam'; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, or distribute}} */ #pragma clang loop unroll_and_jam(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma unroll_and_jam
+/* expected-error {{expected statement}} */ }
Index: test/CodeGenCXX/pragma-unroll-and-jam.cpp
===
--- /dev/null
+++ test/CodeGenCXX/pragma-unroll-and-jam.cpp
@@ -0,0 +1,55 @@
+// RUN: %clang_cc1 -triple arm-none-eabi -std=c++11 -emit-llvm -o - %s | FileCheck %s
+
+void unroll_and_jam(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z14unroll_and_jam
+#pragma unroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_1:.*]]
+  List[i * Length + j] = Value;
+}
+  }
+}
+
+void unroll_and_jam_count(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z20unroll_and_jam_count
+#pragma unroll_and_jam(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_2:.*]]
+  List[i * Length + j] = Value;
+}
+  }
+}
+
+void nounroll_and_jam(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z16nounroll_and_jam
+#pragma nounroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_3:.*]]
+  List[i * Length + j] = Value;
+}
+  }
+}
+
+void clang_unroll_plus_nounroll_and_jam(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z34clang_unroll_plus_nounroll_and_jam
+#pragma nounroll_and_jam
+#pragma unroll(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_7:.*

[PATCH] D51416: [RTTI] Align rtti type string to prevent over-alignment

2018-08-29 Thread Dave Green via Phabricator via cfe-commits
dmgreen created this revision.
dmgreen added reviewers: efriedma, echristo, hfinkel, davide.

Previously the alignment on the newly created rtti string was not set,
meaning that DataLayout::getPreferredAlignment was free to overalign
it to 16 bytes. This causes unnecessary code bloat.


https://reviews.llvm.org/D51416

Files:
  lib/CodeGen/ItaniumCXXABI.cpp
  test/CodeGenCXX/vtable-align.cpp
  test/CodeGenCXX/vtable-linkage.cpp


Index: test/CodeGenCXX/vtable-linkage.cpp
===
--- test/CodeGenCXX/vtable-linkage.cpp
+++ test/CodeGenCXX/vtable-linkage.cpp
@@ -99,7 +99,7 @@
 // C has no key function, so its vtable should have weak_odr linkage
 // and hidden visibility (rdar://problem/7523229).
 // CHECK-DAG: @_ZTV1C = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1C = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1C = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
 // CHECK-DAG: @_ZTI1C = linkonce_odr constant {{.*}}, comdat{{$}}
 // CHECK-DAG: @_ZTT1C = linkonce_odr unnamed_addr constant {{.*}}, comdat{{$}}
 
@@ -120,26 +120,26 @@
 // defined in this translation unit, so its vtable should have
 // weak_odr linkage.
 // CHECK-DAG: @_ZTV1EIsE = weak_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1EIsE = weak_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1EIsE = weak_odr constant {{.*}}, comdat, align 1{{$}}
 // CHECK-DAG: @_ZTI1EIsE = weak_odr constant {{.*}}, comdat{{$}}
 
 // F is an explicit template instantiation without a key
 // function, so its vtable should have weak_odr linkage
 // CHECK-DAG: @_ZTV1FIsE = weak_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIsE = weak_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIsE = weak_odr constant {{.*}}, comdat, align 1{{$}}
 // CHECK-DAG: @_ZTI1FIsE = weak_odr constant {{.*}}, comdat{{$}}
 
 // E is an implicit template instantiation with a key function
 // defined in this translation unit, so its vtable should have
 // linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1EIlE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1EIlE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1EIlE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
 // CHECK-DAG: @_ZTI1EIlE = linkonce_odr constant {{.*}}, comdat{{$}}
 
 // F is an implicit template instantiation with no key function,
 // so its vtable should have linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1FIlE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIlE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIlE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
 // CHECK-DAG: @_ZTI1FIlE = linkonce_odr constant {{.*}}, comdat{{$}}
 
 // F is an explicit template instantiation declaration without a
@@ -171,7 +171,7 @@
 // F is an explicit specialization without a key function, so
 // its vtable should have linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1FIcE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIcE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIcE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
 // CHECK-DAG: @_ZTI1FIcE = linkonce_odr constant {{.*}}, comdat{{$}}
 
 // CHECK-DAG: @_ZTV1GIiE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
Index: test/CodeGenCXX/vtable-align.cpp
===
--- test/CodeGenCXX/vtable-align.cpp
+++ test/CodeGenCXX/vtable-align.cpp
@@ -10,5 +10,6 @@
 void A::f() {}
 
 // CHECK-32: @_ZTV1A = unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* 
null, i8* bitcast ({ i8*, i8* }* @_ZTI1A to i8*), i8* bitcast (void 
(%struct.A*)* @_ZN1A1fEv to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1gEv to 
i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1hEv to i8*)] }, align 4
-
+// CHECK-32: @_ZTS1A = constant [3 x i8] c"1A\00", align 1
 // CHECK-64: @_ZTV1A = unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* 
null, i8* bitcast ({ i8*, i8* }* @_ZTI1A to i8*), i8* bitcast (void 
(%struct.A*)* @_ZN1A1fEv to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1gEv to 
i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1hEv to i8*)] }, align 8
+// CHECK-64: @_ZTS1A = constant [3 x i8] c"1A\00", align 1
Index: lib/CodeGen/ItaniumCXXABI.cpp
===
--- lib/CodeGen/ItaniumCXXABI.cpp
+++ lib/CodeGen/ItaniumCXXABI.cpp
@@ -2730,6 +2730,7 @@
 CGM.CreateOrReplaceCXXRuntimeVariable(Name, Init->getType(), Linkage);
 
   GV->setInitializer(Init);
+  GV->setAlignment(1); // prevent over-aligning the string
 
   return GV;
 }


Index: test/CodeGenCXX/vtable-linkage.cpp
===
--- test/CodeGenCXX/vtable-linkage.cpp
+++ test/CodeGenCXX/vtable-linkage.cpp
@@ -99,7 +99,7 @@
 // C has no key function, so its vtable should have weak_odr linkage
 // and hidden visibility (rdar://pr

[PATCH] D51416: [RTTI] Align rtti types to prevent over-alignment

2018-09-03 Thread Dave Green via Phabricator via cfe-commits
dmgreen updated this revision to Diff 163727.
dmgreen retitled this revision from "[RTTI] Align rtti type string to prevent 
over-alignment" to "[RTTI] Align rtti types to prevent over-alignment".
dmgreen edited the summary of this revision.
dmgreen added a comment.

I've become less sure about what the various alignments should be here. There 
seem to be multiple ways to calculate such things, I'm not sure which are best. 
For example, for want of a better option I've used getPointerAlign in 
ItaniumRTTIBuilder::BuildTypeInfo.


https://reviews.llvm.org/D51416

Files:
  lib/CodeGen/CGVTT.cpp
  lib/CodeGen/CGVTables.cpp
  lib/CodeGen/CodeGenModule.cpp
  lib/CodeGen/CodeGenModule.h
  lib/CodeGen/ItaniumCXXABI.cpp
  lib/CodeGen/MicrosoftCXXABI.cpp
  test/CodeGenCXX/vtable-align.cpp
  test/CodeGenCXX/vtable-linkage.cpp

Index: test/CodeGenCXX/vtable-linkage.cpp
===
--- test/CodeGenCXX/vtable-linkage.cpp
+++ test/CodeGenCXX/vtable-linkage.cpp
@@ -98,10 +98,10 @@
 
 // C has no key function, so its vtable should have weak_odr linkage
 // and hidden visibility (rdar://problem/7523229).
-// CHECK-DAG: @_ZTV1C = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1C = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1C = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTT1C = linkonce_odr unnamed_addr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTV1C = linkonce_odr unnamed_addr constant {{.*}}, comdat, align 8{{$}}
+// CHECK-DAG: @_ZTS1C = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1C = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
+// CHECK-DAG: @_ZTT1C = linkonce_odr unnamed_addr constant {{.*}}, comdat, align 8{{$}}
 
 // D has a key function that is defined in this translation unit so its vtable is
 // defined in the translation unit.
@@ -120,27 +120,27 @@
 // defined in this translation unit, so its vtable should have
 // weak_odr linkage.
 // CHECK-DAG: @_ZTV1EIsE = weak_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1EIsE = weak_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1EIsE = weak_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1EIsE = weak_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1EIsE = weak_odr constant {{.*}}, comdat, align 8{{$}}
 
 // F is an explicit template instantiation without a key
 // function, so its vtable should have weak_odr linkage
 // CHECK-DAG: @_ZTV1FIsE = weak_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIsE = weak_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1FIsE = weak_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIsE = weak_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1FIsE = weak_odr constant {{.*}}, comdat, align 8{{$}}
 
 // E is an implicit template instantiation with a key function
 // defined in this translation unit, so its vtable should have
 // linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1EIlE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1EIlE = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1EIlE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1EIlE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1EIlE = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
 
 // F is an implicit template instantiation with no key function,
 // so its vtable should have linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1FIlE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIlE = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1FIlE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIlE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1FIlE = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
 
 // F is an explicit template instantiation declaration without a
 // key function, so its vtable should have external linkage.
@@ -171,8 +171,8 @@
 // F is an explicit specialization without a key function, so
 // its vtable should have linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1FIcE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIcE = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1FIcE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIcE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1FIcE = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
 
 // CHECK-DAG: @_ZTV1GIiE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
 template 
Index: test/CodeGenCXX/vtable-align.cpp
===
--- test/CodeGenCXX/vtable-align.cpp
+++ test/CodeGenCXX/vtable-align.cpp
@@ -10,5 +10,8 @@
 void A::f() {}
 
 // CHECK-32: @_ZTV1A = unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* null, i8* bitcast ({ i8*, i8* }* @_ZTI1A to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1fEv to i8*),

[PATCH] D51683: Fix arm_neon.h and arm_fp16.h generation for compiling with std=c89

2018-09-06 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

These new changes look good to me.

If you are updating things like this, it's often better to create a new Phab 
review so it's easier to see it's a new thing (or, in cases like this where the 
changes are simple and just test updates, they often don't need review). Either 
way, LGTM. Thanks for the test fix.


https://reviews.llvm.org/D51683



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51416: [RTTI] Align rtti types to prevent over-alignment

2018-09-10 Thread Dave Green via Phabricator via cfe-commits
dmgreen updated this revision to Diff 164693.
dmgreen added a comment.

Now using getABITypeAlignment, plus added a simple test


https://reviews.llvm.org/D51416

Files:
  lib/CodeGen/CGVTT.cpp
  lib/CodeGen/CGVTables.cpp
  lib/CodeGen/CodeGenModule.cpp
  lib/CodeGen/CodeGenModule.h
  lib/CodeGen/ItaniumCXXABI.cpp
  lib/CodeGen/MicrosoftCXXABI.cpp
  test/CodeGenCXX/microsoft-abi-vbtables.cpp
  test/CodeGenCXX/vtable-align.cpp
  test/CodeGenCXX/vtable-linkage.cpp

Index: test/CodeGenCXX/vtable-linkage.cpp
===
--- test/CodeGenCXX/vtable-linkage.cpp
+++ test/CodeGenCXX/vtable-linkage.cpp
@@ -98,10 +98,10 @@
 
 // C has no key function, so its vtable should have weak_odr linkage
 // and hidden visibility (rdar://problem/7523229).
-// CHECK-DAG: @_ZTV1C = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1C = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1C = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTT1C = linkonce_odr unnamed_addr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTV1C = linkonce_odr unnamed_addr constant {{.*}}, comdat, align 8{{$}}
+// CHECK-DAG: @_ZTS1C = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1C = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
+// CHECK-DAG: @_ZTT1C = linkonce_odr unnamed_addr constant {{.*}}, comdat, align 8{{$}}
 
 // D has a key function that is defined in this translation unit so its vtable is
 // defined in the translation unit.
@@ -120,27 +120,27 @@
 // defined in this translation unit, so its vtable should have
 // weak_odr linkage.
 // CHECK-DAG: @_ZTV1EIsE = weak_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1EIsE = weak_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1EIsE = weak_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1EIsE = weak_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1EIsE = weak_odr constant {{.*}}, comdat, align 8{{$}}
 
 // F is an explicit template instantiation without a key
 // function, so its vtable should have weak_odr linkage
 // CHECK-DAG: @_ZTV1FIsE = weak_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIsE = weak_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1FIsE = weak_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIsE = weak_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1FIsE = weak_odr constant {{.*}}, comdat, align 8{{$}}
 
 // E is an implicit template instantiation with a key function
 // defined in this translation unit, so its vtable should have
 // linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1EIlE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1EIlE = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1EIlE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1EIlE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1EIlE = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
 
 // F is an implicit template instantiation with no key function,
 // so its vtable should have linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1FIlE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIlE = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1FIlE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIlE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1FIlE = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
 
 // F is an explicit template instantiation declaration without a
 // key function, so its vtable should have external linkage.
@@ -171,8 +171,8 @@
 // F is an explicit specialization without a key function, so
 // its vtable should have linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1FIcE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIcE = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1FIcE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIcE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1FIcE = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
 
 // CHECK-DAG: @_ZTV1GIiE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
 template 
Index: test/CodeGenCXX/vtable-align.cpp
===
--- test/CodeGenCXX/vtable-align.cpp
+++ test/CodeGenCXX/vtable-align.cpp
@@ -10,5 +10,8 @@
 void A::f() {}
 
 // CHECK-32: @_ZTV1A = unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* null, i8* bitcast ({ i8*, i8* }* @_ZTI1A to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1fEv to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1gEv to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1hEv to i8*)] }, align 4
-
+// CHECK-32: @_ZTS1A = constant [3 x i8] c"1A\00", align 1
+// CHECK-32: @_ZTI1A = constant { i8*, i8* } { i8* bitcast (i8** getelementptr inbounds (i8*, i8** @_ZTVN10__cxxabiv117__class_type_infoE, i32 2) to i8*), i8* getelementptr inbounds 

[PATCH] D51416: [RTTI] Align rtti types to prevent over-alignment

2018-09-12 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

That sounds like a good idea. Thanks for the help on this one.


https://reviews.llvm.org/D51416



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D51416: [RTTI] Align rtti types to prevent over-alignment

2018-09-12 Thread Dave Green via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL342053: [CodeGen] Align rtti and vtable data (authored by 
dmgreen, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D51416?vs=164693&id=165077#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D51416

Files:
  cfe/trunk/lib/CodeGen/CGVTT.cpp
  cfe/trunk/lib/CodeGen/CGVTables.cpp
  cfe/trunk/lib/CodeGen/CodeGenModule.cpp
  cfe/trunk/lib/CodeGen/CodeGenModule.h
  cfe/trunk/lib/CodeGen/ItaniumCXXABI.cpp
  cfe/trunk/lib/CodeGen/MicrosoftCXXABI.cpp
  cfe/trunk/test/CodeGenCXX/microsoft-abi-vbtables.cpp
  cfe/trunk/test/CodeGenCXX/vtable-align.cpp
  cfe/trunk/test/CodeGenCXX/vtable-linkage.cpp

Index: cfe/trunk/test/CodeGenCXX/vtable-align.cpp
===
--- cfe/trunk/test/CodeGenCXX/vtable-align.cpp
+++ cfe/trunk/test/CodeGenCXX/vtable-align.cpp
@@ -10,5 +10,8 @@
 void A::f() {}
 
 // CHECK-32: @_ZTV1A = unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* null, i8* bitcast ({ i8*, i8* }* @_ZTI1A to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1fEv to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1gEv to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1hEv to i8*)] }, align 4
-
+// CHECK-32: @_ZTS1A = constant [3 x i8] c"1A\00", align 1
+// CHECK-32: @_ZTI1A = constant { i8*, i8* } { i8* bitcast (i8** getelementptr inbounds (i8*, i8** @_ZTVN10__cxxabiv117__class_type_infoE, i32 2) to i8*), i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_ZTS1A, i32 0, i32 0) }, align 4
 // CHECK-64: @_ZTV1A = unnamed_addr constant { [5 x i8*] } { [5 x i8*] [i8* null, i8* bitcast ({ i8*, i8* }* @_ZTI1A to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1fEv to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1gEv to i8*), i8* bitcast (void (%struct.A*)* @_ZN1A1hEv to i8*)] }, align 8
+// CHECK-64: @_ZTS1A = constant [3 x i8] c"1A\00", align 1
+// CHECK-64: @_ZTI1A = constant { i8*, i8* } { i8* bitcast (i8** getelementptr inbounds (i8*, i8** @_ZTVN10__cxxabiv117__class_type_infoE, i64 2) to i8*), i8* getelementptr inbounds ([3 x i8], [3 x i8]* @_ZTS1A, i32 0, i32 0) }, align 8
Index: cfe/trunk/test/CodeGenCXX/vtable-linkage.cpp
===
--- cfe/trunk/test/CodeGenCXX/vtable-linkage.cpp
+++ cfe/trunk/test/CodeGenCXX/vtable-linkage.cpp
@@ -98,10 +98,10 @@
 
 // C has no key function, so its vtable should have weak_odr linkage
 // and hidden visibility (rdar://problem/7523229).
-// CHECK-DAG: @_ZTV1C = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1C = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1C = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTT1C = linkonce_odr unnamed_addr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTV1C = linkonce_odr unnamed_addr constant {{.*}}, comdat, align 8{{$}}
+// CHECK-DAG: @_ZTS1C = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1C = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
+// CHECK-DAG: @_ZTT1C = linkonce_odr unnamed_addr constant {{.*}}, comdat, align 8{{$}}
 
 // D has a key function that is defined in this translation unit so its vtable is
 // defined in the translation unit.
@@ -120,27 +120,27 @@
 // defined in this translation unit, so its vtable should have
 // weak_odr linkage.
 // CHECK-DAG: @_ZTV1EIsE = weak_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1EIsE = weak_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1EIsE = weak_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1EIsE = weak_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1EIsE = weak_odr constant {{.*}}, comdat, align 8{{$}}
 
 // F is an explicit template instantiation without a key
 // function, so its vtable should have weak_odr linkage
 // CHECK-DAG: @_ZTV1FIsE = weak_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIsE = weak_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1FIsE = weak_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1FIsE = weak_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1FIsE = weak_odr constant {{.*}}, comdat, align 8{{$}}
 
 // E is an implicit template instantiation with a key function
 // defined in this translation unit, so its vtable should have
 // linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1EIlE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1EIlE = linkonce_odr constant {{.*}}, comdat{{$}}
-// CHECK-DAG: @_ZTI1EIlE = linkonce_odr constant {{.*}}, comdat{{$}}
+// CHECK-DAG: @_ZTS1EIlE = linkonce_odr constant {{.*}}, comdat, align 1{{$}}
+// CHECK-DAG: @_ZTI1EIlE = linkonce_odr constant {{.*}}, comdat, align 8{{$}}
 
 // F is an implicit template instantiation with no key function,
 // so its vtable should have linkonce_odr linkage.
 // CHECK-DAG: @_ZTV1FIlE = linkonce_odr unnamed_addr constant {{.*}}, comdat,
-// CHECK-DAG: @_ZTS1FIlE = linkonce

[PATCH] D49674: [AArch64] Add Tiny Code Model for AArch64

2018-07-23 Thread Dave Green via Phabricator via cfe-commits
dmgreen created this revision.
dmgreen added reviewers: t.p.northover, olista01, john.brawn.
Herald added a reviewer: javed.absar.
Herald added a subscriber: kristof.beyls.

Adds a tiny code model as a Clang side of https://reviews.llvm.org/D49673.


https://reviews.llvm.org/D49674

Files:
  include/clang/Driver/CC1Options.td
  lib/CodeGen/BackendUtil.cpp
  lib/Frontend/CompilerInvocation.cpp
  test/Driver/code-model.c


Index: test/Driver/code-model.c
===
--- test/Driver/code-model.c
+++ test/Driver/code-model.c
@@ -1,9 +1,11 @@
+// RUN: %clang -### -c -mcmodel=tiny %s 2>&1 | FileCheck -check-prefix 
CHECK-TINY %s
 // RUN: %clang -### -c -mcmodel=small %s 2>&1 | FileCheck -check-prefix 
CHECK-SMALL %s
 // RUN: %clang -### -S -mcmodel=kernel %s 2>&1 | FileCheck -check-prefix 
CHECK-KERNEL %s
 // RUN: %clang -### -c -mcmodel=medium %s 2>&1 | FileCheck -check-prefix 
CHECK-MEDIUM %s
 // RUN: %clang -### -S -mcmodel=large %s 2>&1 | FileCheck -check-prefix 
CHECK-LARGE %s
 // RUN: not %clang -c -mcmodel=lager %s 2>&1 | FileCheck -check-prefix 
CHECK-INVALID %s
 
+// CHECK-TINY: "-mcode-model" "tiny"
 // CHECK-SMALL: "-mcode-model" "small"
 // CHECK-KERNEL: "-mcode-model" "kernel"
 // CHECK-MEDIUM: "-mcode-model" "medium"
Index: lib/Frontend/CompilerInvocation.cpp
===
--- lib/Frontend/CompilerInvocation.cpp
+++ lib/Frontend/CompilerInvocation.cpp
@@ -369,7 +369,7 @@
   if (Arg *A = Args.getLastArg(OPT_mcode_model)) {
 StringRef Value = A->getValue();
 if (Value == "small" || Value == "kernel" || Value == "medium" ||
-Value == "large")
+Value == "large" || Value == "tiny")
   return Value;
 Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Value;
   }
Index: lib/CodeGen/BackendUtil.cpp
===
--- lib/CodeGen/BackendUtil.cpp
+++ lib/CodeGen/BackendUtil.cpp
@@ -368,6 +368,7 @@
 static Optional
 getCodeModel(const CodeGenOptions &CodeGenOpts) {
   unsigned CodeModel = llvm::StringSwitch(CodeGenOpts.CodeModel)
+   .Case("tiny", llvm::CodeModel::Tiny)
.Case("small", llvm::CodeModel::Small)
.Case("kernel", llvm::CodeModel::Kernel)
.Case("medium", llvm::CodeModel::Medium)
Index: include/clang/Driver/CC1Options.td
===
--- include/clang/Driver/CC1Options.td
+++ include/clang/Driver/CC1Options.td
@@ -253,7 +253,7 @@
 def masm_verbose : Flag<["-"], "masm-verbose">,
   HelpText<"Generate verbose assembly output">;
 def mcode_model : Separate<["-"], "mcode-model">,
-  HelpText<"The code model to use">, Values<"small,kernel,medium,large">;
+  HelpText<"The code model to use">, Values<"tiny,small,kernel,medium,large">;
 def mdebug_pass : Separate<["-"], "mdebug-pass">,
   HelpText<"Enable additional debug output">;
 def mdisable_fp_elim : Flag<["-"], "mdisable-fp-elim">,


Index: test/Driver/code-model.c
===
--- test/Driver/code-model.c
+++ test/Driver/code-model.c
@@ -1,9 +1,11 @@
+// RUN: %clang -### -c -mcmodel=tiny %s 2>&1 | FileCheck -check-prefix CHECK-TINY %s
 // RUN: %clang -### -c -mcmodel=small %s 2>&1 | FileCheck -check-prefix CHECK-SMALL %s
 // RUN: %clang -### -S -mcmodel=kernel %s 2>&1 | FileCheck -check-prefix CHECK-KERNEL %s
 // RUN: %clang -### -c -mcmodel=medium %s 2>&1 | FileCheck -check-prefix CHECK-MEDIUM %s
 // RUN: %clang -### -S -mcmodel=large %s 2>&1 | FileCheck -check-prefix CHECK-LARGE %s
 // RUN: not %clang -c -mcmodel=lager %s 2>&1 | FileCheck -check-prefix CHECK-INVALID %s
 
+// CHECK-TINY: "-mcode-model" "tiny"
 // CHECK-SMALL: "-mcode-model" "small"
 // CHECK-KERNEL: "-mcode-model" "kernel"
 // CHECK-MEDIUM: "-mcode-model" "medium"
Index: lib/Frontend/CompilerInvocation.cpp
===
--- lib/Frontend/CompilerInvocation.cpp
+++ lib/Frontend/CompilerInvocation.cpp
@@ -369,7 +369,7 @@
   if (Arg *A = Args.getLastArg(OPT_mcode_model)) {
 StringRef Value = A->getValue();
 if (Value == "small" || Value == "kernel" || Value == "medium" ||
-Value == "large")
+Value == "large" || Value == "tiny")
   return Value;
 Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Value;
   }
Index: lib/CodeGen/BackendUtil.cpp
===
--- lib/CodeGen/BackendUtil.cpp
+++ lib/CodeGen/BackendUtil.cpp
@@ -368,6 +368,7 @@
 static Optional
 getCodeModel(const CodeGenOptions &CodeGenOpts) {
   unsigned CodeModel = llvm::StringSwitch(CodeGenOpts.CodeModel)
+   .Case("tiny", llvm::CodeModel::Tiny)
.Case("small", llvm::CodeModel::Small)
   

[PATCH] D47267: [UnrollAndJam] Add unroll_and_jam pragma handling

2018-07-31 Thread Dave Green via Phabricator via cfe-commits
dmgreen updated this revision to Diff 158219.
dmgreen added a comment.

Rebase.

Michael, you happy with this part?

The pragma clang loop part is off in https://reviews.llvm.org/D47320. Let me 
know if/when I should do something with that.


https://reviews.llvm.org/D47267

Files:
  include/clang/Basic/Attr.td
  include/clang/Parse/Parser.h
  lib/CodeGen/CGLoopInfo.cpp
  lib/CodeGen/CGLoopInfo.h
  lib/Parse/ParsePragma.cpp
  lib/Sema/SemaStmtAttr.cpp
  test/CodeGenCXX/pragma-unroll-and-jam.cpp
  test/Parser/pragma-unroll-and-jam.cpp

Index: test/Parser/pragma-unroll-and-jam.cpp
===
--- /dev/null
+++ test/Parser/pragma-unroll-and-jam.cpp
@@ -0,0 +1,78 @@
+// RUN: %clang_cc1 -std=c++11 -verify %s
+
+// Note that this puts the expected lines before the directives to work around
+// limitations in the -verify mode.
+
+void test(int *List, int Length, int Value) {
+  int i = 0;
+
+#pragma unroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma nounroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma unroll_and_jam 4
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+/* expected-error {{expected ')'}} */ #pragma unroll_and_jam(4
+/* expected-error {{missing argument; expected an integer value}} */ #pragma unroll_and_jam()
+/* expected-warning {{extra tokens at end of '#pragma unroll_and_jam'}} */ #pragma unroll_and_jam 1 2
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+/* expected-warning {{extra tokens at end of '#pragma nounroll_and_jam'}} */ #pragma nounroll_and_jam 1
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma unroll_and_jam
+/* expected-error {{expected a for, while, or do-while loop to follow '#pragma unroll_and_jam'}} */ int j = Length;
+#pragma unroll_and_jam 4
+/* expected-error {{expected a for, while, or do-while loop to follow '#pragma unroll_and_jam'}} */ int k = Length;
+#pragma nounroll_and_jam
+/* expected-error {{expected a for, while, or do-while loop to follow '#pragma nounroll_and_jam'}} */ int l = Length;
+
+/* expected-error {{incompatible directives '#pragma nounroll_and_jam' and '#pragma unroll_and_jam(4)'}} */ #pragma unroll_and_jam 4
+#pragma nounroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma nounroll_and_jam
+#pragma unroll(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+// pragma clang unroll_and_jam is disabled for the moment
+/* expected-error {{invalid option 'unroll_and_jam'; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, or distribute}} */ #pragma clang loop unroll_and_jam(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  List[i * Length + j] = Value;
+}
+  }
+
+#pragma unroll_and_jam
+/* expected-error {{expected statement}} */ }
Index: test/CodeGenCXX/pragma-unroll-and-jam.cpp
===
--- /dev/null
+++ test/CodeGenCXX/pragma-unroll-and-jam.cpp
@@ -0,0 +1,55 @@
+// RUN: %clang_cc1 -triple arm-none-eabi -std=c++11 -emit-llvm -o - %s | FileCheck %s
+
+void unroll_and_jam(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z14unroll_and_jam
+#pragma unroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_1:.*]]
+  List[i * Length + j] = Value;
+}
+  }
+}
+
+void unroll_and_jam_count(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z20unroll_and_jam_count
+#pragma unroll_and_jam(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_2:.*]]
+  List[i * Length + j] = Value;
+}
+  }
+}
+
+void nounroll_and_jam(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z16nounroll_and_jam
+#pragma nounroll_and_jam
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_3:.*]]
+  List[i * Length + j] = Value;
+}
+  }
+}
+
+void clang_unroll_plus_nounroll_and_jam(int *List, int Length, int Value) {
+  // CHECK: define {{.*}} @_Z34clang_unroll_plus_nounroll_and_jam
+#pragma nounroll_and_jam
+#pragma unroll(4)
+  for (int i = 0; i < Length; i++) {
+for (int j = 0; j < Length; j++) {
+  // CHECK: br label {{.*}}, !llvm.loop ![[LOOP_7:.*]]
+  List[i *

[PATCH] D47267: [UnrollAndJam] Add unroll_and_jam pragma handling

2018-08-01 Thread Dave Green via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rC338566: [UnrollAndJam] Add unroll_and_jam pragma handling 
(authored by dmgreen, committed by ).

Changed prior to commit:
  https://reviews.llvm.org/D47267?vs=158219&id=158537#toc

Repository:
  rC Clang

https://reviews.llvm.org/D47267

Files:
  include/clang/Basic/Attr.td
  include/clang/Parse/Parser.h
  lib/CodeGen/CGLoopInfo.cpp
  lib/CodeGen/CGLoopInfo.h
  lib/Parse/ParsePragma.cpp
  lib/Sema/SemaStmtAttr.cpp
  test/CodeGenCXX/pragma-unroll-and-jam.cpp
  test/Parser/pragma-unroll-and-jam.cpp

Index: include/clang/Basic/Attr.td
===
--- include/clang/Basic/Attr.td
+++ include/clang/Basic/Attr.td
@@ -2748,22 +2748,27 @@
   /// interleave_count: interleaves 'Value' loop iterations.
   /// unroll: fully unroll loop if State == Enable.
   /// unroll_count: unrolls loop 'Value' times.
+  /// unroll_and_jam: attempt to unroll and jam loop if State == Enable.
+  /// unroll_and_jam_count: unroll and jams loop 'Value' times.
   /// distribute: attempt to distribute loop if State == Enable
 
   /// #pragma unroll  directive
   /// : fully unrolls loop.
   /// boolean: fully unrolls loop if State == Enable.
   /// expression: unrolls loop 'Value' times.
 
   let Spellings = [Pragma<"clang", "loop">, Pragma<"", "unroll">,
-   Pragma<"", "nounroll">];
+   Pragma<"", "nounroll">, Pragma<"", "unroll_and_jam">,
+   Pragma<"", "nounroll_and_jam">];
 
   /// State of the loop optimization specified by the spelling.
   let Args = [EnumArgument<"Option", "OptionType",
   ["vectorize", "vectorize_width", "interleave", "interleave_count",
-   "unroll", "unroll_count", "distribute"],
+   "unroll", "unroll_count", "unroll_and_jam", "unroll_and_jam_count",
+   "distribute"],
   ["Vectorize", "VectorizeWidth", "Interleave", "InterleaveCount",
-   "Unroll", "UnrollCount", "Distribute"]>,
+   "Unroll", "UnrollCount", "UnrollAndJam", "UnrollAndJamCount",
+   "Distribute"]>,
   EnumArgument<"State", "LoopHintState",
["enable", "disable", "numeric", "assume_safety", "full"],
["Enable", "Disable", "Numeric", "AssumeSafety", "Full"]>,
@@ -2778,6 +2783,8 @@
 case InterleaveCount: return "interleave_count";
 case Unroll: return "unroll";
 case UnrollCount: return "unroll_count";
+case UnrollAndJam: return "unroll_and_jam";
+case UnrollAndJamCount: return "unroll_and_jam_count";
 case Distribute: return "distribute";
 }
 llvm_unreachable("Unhandled LoopHint option.");
@@ -2787,9 +2794,9 @@
 unsigned SpellingIndex = getSpellingListIndex();
 // For "#pragma unroll" and "#pragma nounroll" the string "unroll" or
 // "nounroll" is already emitted as the pragma name.
-if (SpellingIndex == Pragma_nounroll)
+if (SpellingIndex == Pragma_nounroll || SpellingIndex == Pragma_nounroll_and_jam)
   return;
-else if (SpellingIndex == Pragma_unroll) {
+else if (SpellingIndex == Pragma_unroll || SpellingIndex == Pragma_unroll_and_jam) {
   OS << ' ' << getValueString(Policy);
   return;
 }
@@ -2825,6 +2832,11 @@
   return "#pragma nounroll";
 else if (SpellingIndex == Pragma_unroll)
   return "#pragma unroll" + (option == UnrollCount ? getValueString(Policy) : "");
+else if (SpellingIndex == Pragma_nounroll_and_jam)
+  return "#pragma nounroll_and_jam";
+else if (SpellingIndex == Pragma_unroll_and_jam)
+  return "#pragma unroll_and_jam" +
+(option == UnrollAndJamCount ? getValueString(Policy) : "");
 
 assert(SpellingIndex == Pragma_clang_loop && "Unexpected spelling");
 return getOptionName(option) + getValueString(Policy);
Index: include/clang/Parse/Parser.h
===
--- include/clang/Parse/Parser.h
+++ include/clang/Parse/Parser.h
@@ -185,6 +185,8 @@
   std::unique_ptr LoopHintHandler;
   std::unique_ptr UnrollHintHandler;
   std::unique_ptr NoUnrollHintHandler;
+  std::unique_ptr UnrollAndJamHintHandler;
+  std::unique_ptr NoUnrollAndJamHintHandler;
   std::unique_ptr FPHandler;
   std::unique_ptr STDCFENVHandler;
   std::unique_ptr STDCCXLIMITHandler;
Index: test/Parser/pragma-unroll-and-jam.cpp
===
--- test/Parser/pragma-unroll-and-jam.cpp
+++ test/Parser/pragma-unroll-and-jam.cpp
@@ -0,0 +1,78 @@
+// RUN: %clang_cc1 -std=c++11 -verify %s
+
+// Note that this puts the expected lines before the directives to work around
+// limitations in the -verify mode.
+
+void test(int *List, int Length, int Value) {
+  int i = 0;
+
+#pragma unroll_and_jam
+  for (int 

[PATCH] D47267: [UnrollAndJam] Add unroll_and_jam pragma handling

2018-08-01 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Thanks.


Repository:
  rC Clang

https://reviews.llvm.org/D47267



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47267: [UnrollAndJam] Add unroll_and_jam pragma handling

2018-06-05 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Thanks.

I noticed in the paper that you used the name "unrollandjam", minus 
underscores. Should I change this use that spelling here? I have no strong 
opinion of one over the other (was just using what I had found from the Intel 
docs).


https://reviews.llvm.org/D47267



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47267: [UnrollAndJam] Add unroll_and_jam pragma handling

2018-06-05 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

I quite like the UnrollAndFuse naming. I'd not heard that the xlc compiler 
called it that. The UnrollAndJam pass was origin named that before I renamed 
for similar reasons (UnrollAndJam being more well known).

I see your point about the mix of underscores. "nounroll_and_jam" also comes 
from the Intel docs, and theres already "nounroll" vs "unroll". The "no" 
becomes a qualifier on "unroll_and_jam". "no_unroll_and_jam" feels less 
consistent to me.

But if you have a strong opinion, I'm happy enough to change it.


https://reviews.llvm.org/D47267



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D47267: [UnrollAndJam] Add unroll_and_jam pragma handling

2018-06-06 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

In https://reviews.llvm.org/D47267#1123318, @hfinkel wrote:

> I have a preference for using the underscores as our primary spelling. I 
> think that it's easier to read.


I agree with it being easier to read.

> I prefer we have a different syntax that we can use consistently within the 
> 'clang loop' pragmas. How about 'unroll_and_jam disable' or similar?

The code I had for #pragma clang loop (now in https://reviews.llvm.org/D47320, 
although I may not have split all the relevant parts into there) was doing the 
same thing as the unroll code. So worked the same way, I think looking like 
"#pragma clang loop unroll_and_jam(disable)" vs enable. It sounds sensible to 
me to have these look the same way as unroll clang loop pragmas, for both the 
old syntax and the new from the RFC.


https://reviews.llvm.org/D47267



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D57978: [CodeGen] Generate follow-up metadata for loops with more than one transformation.

2019-03-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Hello. I also don't feel very familiar with clang, but had a poke around and I 
think it looks pretty good. I see unroll and jam is being awkward again.

This could maybe do with a few extra tests. Am I correct in saying something 
like this:

  #pragma unroll_and_jam(4)
  for(int j = 0; j < n; j++) {
#pragma unroll(4)
for(int k = 0; k < n; k++) {
  x[j*n+k] = 10;
}
  }

would end up with a llvm.loop.unroll_and_jam.followup_inner with a 
llvm.loop.unroll_count?




Comment at: lib/CodeGen/CGLoopInfo.cpp:500
+// Unroll-and-jam of an inner loop and unroll-and-jam of the same loop as
+// the outer loop does not make much sense, but we have to pick an order.
+AfterJam.UnrollAndJamCount = Attrs.UnrollAndJamCount;

I was having trouble parsing this sentance. Does it mean both the inner loop 
and the outer loop both have unroll-and-jam? UnrollAndJam processes loops from 
inner to outer, so if this is working as I think, maybe it should be put into 
BeforeJam.



Comment at: lib/CodeGen/CGLoopInfo.cpp:502
+AfterJam.UnrollAndJamCount = Attrs.UnrollAndJamCount;
+AfterJam.UnrollAndJamEnable = AfterJam.UnrollAndJamEnable;
+

 = Attrs.UnrollAndJamEnable ?



Comment at: lib/CodeGen/CGLoopInfo.h:135
+  ///  LoopID.
+  /// @param HasUserTransforms [out] Set to true if the returned MDNode 
encoodes
+  ///  at least one transformation.

*encodes



Comment at: lib/CodeGen/CGLoopInfo.h:170
+  /// @param Attrs This loop's attributes and transfomations.
+  /// @param HasUserTransforms [out] Set to true if the returned MDNode 
encoodes
+  ///  at least one transformation.

*encodes


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57978/new/

https://reviews.llvm.org/D57978



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D57978: [CodeGen] Generate follow-up metadata for loops with more than one transformation.

2019-03-26 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Yeah, OK. This looks like a good patch to me. As I said, I'm not a clang 
expert, but the code looks sensible enough. (Perhaps wait a couple of days in 
case others have objections.)


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57978/new/

https://reviews.llvm.org/D57978



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59879: [ARM][CMSE] Add commandline option and feature macro

2019-03-29 Thread Dave Green via Phabricator via cfe-commits
dmgreen added reviewers: olista01, christof.
dmgreen added inline comments.



Comment at: include/clang/Driver/Options.td:2145
+  Flags<[DriverOption,CC1Option]>,
+  HelpText<"Allow use of CMSE instructions (ARM only)">;
 

Should this say something about compiling for secure state?



Comment at: lib/Basic/Targets/ARM.cpp:438
+} else if (Feature == "+8msecext") {
+  if ((CPUProfile != "M" && CPUProfile != "B") || ArchVersion != 8) {
+Diags.Report(diag::err_target_unsupported_mcmse) << CPU;

snidertm wrote:
> How does CPUProfile get a value of "B"? I thought any Cortex-M processor 
> would set CPUProfile to "M". Is CMSE available on a processor besides 
> Cortex-m33?
"B" was going to be a very old name for v8m-baseline, looks like this one was 
never cleaned up when that was changed. You can drop the != "B" check.



Comment at: lib/Driver/ToolChains/Clang.cpp:1390
+
+  if (Args.getLastArg(options::OPT_mcmse)) {
+CmdArgs.push_back("-mcmse");

Don't need the brackets here



Comment at: lib/Frontend/CompilerInvocation.cpp:2705
   Opts.GNUAsm = !Args.hasArg(OPT_fno_gnu_inline_asm);
+  Opts.Cmse = Args.hasArg(OPT_mcmse); // CMSE
 

This comment seems superfluous


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59879/new/

https://reviews.llvm.org/D59879



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60691: [ARM] Replace fp-only-sp and d16 with fp64 and d32.

2019-04-17 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

I don't think we here care about auto-updating, not supporting bitcode/lto 
libraries.




Comment at: llvm/lib/Target/ARM/ARMFastISel.cpp:1362
+  if (Ty->isDoubleTy() && (!Subtarget->hasVFP2Base() || !Subtarget->hasFP64()))
+ return false;
 

Some of the formatting here got a little messed up.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60691/new/

https://reviews.llvm.org/D60691



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D62729: [ARM] Fix recent breakage of -mfpu=none.

2019-05-31 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: clang/test/CodeGen/arm-mfpu-none.c:2
+// REQUIRES: arm-registered-target
+// RUN: %clang -target arm-none-eabi -mcpu=cortex-m4 -mfpu=none -S -o - %s | 
FileCheck %s
+

lebedev.ri wrote:
> Generally clang codegen tests should test `-emit-llvm -S` output
Perhaps add to the tests in clang/test/Driver/arm-mfpu.c instead?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62729/new/

https://reviews.llvm.org/D62729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D62729: [ARM] Fix recent breakage of -mfpu=none.

2019-05-31 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: clang/test/CodeGen/arm-mfpu-none.c:2
+// REQUIRES: arm-registered-target
+// RUN: %clang -target arm-none-eabi -mcpu=cortex-m4 -mfpu=none -S -o - %s | 
FileCheck %s
+

simon_tatham wrote:
> dmgreen wrote:
> > lebedev.ri wrote:
> > > Generally clang codegen tests should test `-emit-llvm -S` output
> > Perhaps add to the tests in clang/test/Driver/arm-mfpu.c instead?
> @lebedev.ri : looking at the IR doesn't show up what I'm trying to test in 
> this case. And I modelled this on existing tests in test/CodeGen, such as 
> `arm-eabi.c`, which also need to test the real assembler output in order to 
> do their job.
> 
> @dmgreen: it's true that I could also make `test/Driver/arm-mfpu.c` check 
> that the extra feature name is appearing in the cc1 command line where I 
> currently think it should be, but I think that wouldn't replace this test. 
> The real point of this test is that the //currently// right feature names 
> might not stay right, if the feature set is reorganised again in future – and 
> if that happens, then whoever does it will probably rewrite most of 
> `arm-mfpu.c` anyway.
> 
> So this is an end-to-end check that //whatever// detailed command line is 
> generated by the driver in these circumstances, it is sufficient to ensure 
> that the final generated code really doesn't include FP instructions.
I believe that the idea is that if we test each step, so make sure we have 
tests from clang driver -> clang-cl and clang -> llvm-ir and llvm -> asm, then 
it should all be tested. And because each has a well defined interfaces along 
each step of the way it should keep working. Any integration tests are often 
left to external tests like the test suite.

But I do see some precedent for this in other places. If you add the extra 
checks in test/Driver/arm-mfpu.c, then I think this is fine. (And I guess if 
someone objects we can always take it out :) )


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62729/new/

https://reviews.llvm.org/D62729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D62729: [ARM] Fix recent breakage of -mfpu=none.

2019-06-03 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks for adding the new test. I agree with keeping both tests ( I 
wasn't very clear).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62729/new/

https://reviews.llvm.org/D62729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D60710: [ARM] Add ACLE feature macros for MVE.

2019-06-07 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Hello. Hopefully fixed in rC362814 !


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60710/new/

https://reviews.llvm.org/D60710



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D59879: [ARM][CMSE] Add commandline option and feature macro

2019-05-16 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

LGTM. With one comment that I will leave to you for what you think is best.




Comment at: include/clang/Driver/Options.td:2145
+  Flags<[DriverOption,CC1Option]>,
+  HelpText<"Allow use of CMSE instructions (ARM only)">;
 

dmgreen wrote:
> Should this say something about compiling for secure state?
Maybe "Enables generation of secure code for CMSE (Armv8-M Security 
Extensions)".

I meant to say that non-secure code can still be compiled without this flag, 
and can use some cmse features (as far as I understand). Its the secure side 
that needs this flag to do the clearing of registers and whatnot.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59879/new/

https://reviews.llvm.org/D59879



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67160: [clang, ARM] Default to -fno-lax-vector-conversions in ARM v8.1-M.

2019-09-13 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

FYI: rL371817 , in case it changes what is 
done here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67160/new/

https://reviews.llvm.org/D67160



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D62394: [ARM][CMSE] Add CMSE header & builtins

2019-09-17 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a subscriber: chill.
dmgreen added a comment.

I'm afraid the upstreaming of CMSE has stalled, and this is not all that would 
be needed to get it working. This adds some header files and clang builtins, 
the selection of them in the backend isn't yet present, hence the error you are 
seeing. There are more patches to follow around lowering intrinsics and 
clearing registers correctly.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62394/new/

https://reviews.llvm.org/D62394



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-09-29 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Sorry. I wasn't ignoring this (sort-of), I just knew that you were on holiday 
and this is a bit of a big one.

But I like this. I still have to take a deeper look into the main tablegen 
parts, but it looks very powerful.

I presume from here adding new intrinsics is mostly a case of adding to 
arm_mve.td (with perhaps some minor parts in defs and maybe some custom bits 
here and there).




Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8529
   "argument should be a multiple of %0">;
+def err_argument_not_power_of_2 : Error<
+  "argument should be a power of 2">;

Do we have any/should we have some tests for these errors?



Comment at: clang/include/clang/Basic/arm_mve_defs.td:266
+// vidupq_wb_u16 -> vidupq_u16.
+def PNT_WB: PolymorphicNameType<0, "wb">;
+

These are not used yet, right? They are not meant for the vldr gathers, those 
don't have polymorphic names (if I'm reading this list of intrinsics right). 
These are for vidup's as the comment says?



Comment at: clang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c:2
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -O3 -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -DPOLYMORPHIC -O3 -S -emit-llvm -o - %s | FileCheck %s

These tests all run -O3, the entire pass pipeline. I see quite a few tests in 
the same folder do the same thing, but in this case we will be adding quite a 
few tests. Random mid-end optimizations may well keep on altering them.

Is it possible to use -disable-O0-optnone and pipe them into opt -mem2reg? What 
would that do to the codegen, would it be a lot more verbose than it is now?

Also could/should they be using clang_cc1?



Comment at: clang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c:3
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -O3 -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -DPOLYMORPHIC -O3 -S -emit-llvm -o - %s | FileCheck %s
+

POLYMORPHIC isn't used here. Same for vcvt below (Is there a polymorphic vcvt?)



Comment at: clang/test/CodeGen/arm-mve-intrinsics/vminvq.c:12
+//
+uint32_t test_vminvq_u32(uint32_t a, uint32x4_t b)
+{

Its probably worth trying to fill in tests for most types.



Comment at: clang/utils/TableGen/MveEmitter.cpp:82
+#if 0
+} // stop emacs from wanting to auto-indent everything to 2 spaces inside here
+#endif

Is this needed still? It seems like something that should be handled in 
clang-format/emacs directly.



Comment at: clang/utils/TableGen/MveEmitter.cpp:1403
+" *\n"
+" * Permission is hereby granted, free of charge, to any person "
+"obtaining a copy\n"

Clang format really made a mess of this, didn't it.

Is this is old license? Should it be updated to the new one. I imagine that 
these generated headers might well have been missed in the switchover.



Comment at: clang/utils/TableGen/MveEmitter.cpp:132
+// the pointee type.
+Pointer,
+  };

The gathers are really a Vector of Pointers, in a way. But in the intrinsics 
(and IR), they are just treated as a vector of ints, so I presume a new type is 
not needed? We may (but not yet) want to make use of the llvm masked gathers. 
We would have to add codegen support for them first though (which I don't think 
we have plans for in the near term).



Comment at: clang/utils/TableGen/MveEmitter.cpp:520
+  // time.
+  bool needs_visiting(unsigned Pass) {
+bool ToRet = Visited < Pass;

needsVisiting.

I would guess the same for other places that use snake_case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-02 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: clang/include/clang/Basic/arm_mve_defs.td:119
+// The type Void can be used for the return type of an intrinsic, and as the
+// parameter type for intrinsics that aren't actually parametrised by any kind
+// of _s32 / _f16 / _u8 suffix.

parametrised->parameterised



Comment at: clang/utils/TableGen/MveEmitter.cpp:207
+// pointer, especially if the pointee is also const.
+if (isa(Pointee)) {
+  if (Const)

Would making this always east const be simpler? Do we generate anything with 
inner pointer types? Should there be a whitespace before the "const " in Name 
+= "const "?



Comment at: clang/utils/TableGen/MveEmitter.cpp:251
+  }
+  virtual unsigned sizeInBits() const override { return Bits; }
+  ScalarTypeKind kind() const { return Kind; }

These don't need to be virtual and override, they can just be override. Same 
elsewhere.



Comment at: clang/utils/TableGen/MveEmitter.cpp:325
+return Element->c_name_base() + "x" + utostr(Registers);
+  }
+

How come this doesn't have/need a llvm_name? Are they just all handled manually?



Comment at: clang/utils/TableGen/MveEmitter.cpp:382
+// ones are the same across the whole set (so that no variable is needed at
+// all).
+//

Cool.

Do you know how much this helps? Do the benefits outweigh the costs in the 
complexity of this code?

And do you know, once we have added a lot of instruction how long this will 
take to run (I know, difficult question to answer without doing it first, but 
is it roughly O(n log n)? And memory will never be excessive? A very 
unscientific test I just ran made it seem pretty quick, but that was obviously 
without the number of intrinsics we will have in the end). We should try and 
ensure that we don't make this a compile time burden for llvm.



Comment at: clang/utils/TableGen/MveEmitter.cpp:455
+
+class Value {
+public:

Is it worth giving this a different name to more obviously distinguish it from 
llvm::Value?



Comment at: clang/utils/TableGen/MveEmitter.cpp:723
+CodeGenParamAllocator DummyParamAlloc;
+V->gen_code(nulls(), DummyParamAlloc);
+

What is this doing?



Comment at: clang/utils/TableGen/MveEmitter.cpp:755
+}
+std::list used;
+gen_code_dfs(Code, used, Pass);

Used

And why a list, out of interest?



Comment at: clang/utils/TableGen/MveEmitter.cpp:1023
+Value::Ptr Arg = getCodeForDagArg(D, 0, Scope, Param);
+if (const auto *ST = dyn_cast(CastType)) {
+  if (!ST->requires_float()) {

Do you envision this may require float const or vector casts in the future?



Comment at: clang/utils/TableGen/MveEmitter.cpp:1125
+  if (!PolymorphicNameType->isValueUnset("ExtraSuffixToDiscard")) {
+std::string ExtraSuffix =
+PolymorphicNameType->getValueAsString("ExtraSuffixToDiscard");

Some of these could use StringRef's more liberally, here and elsewhere.



Comment at: clang/utils/TableGen/MveEmitter.cpp:1595
+  // Is this parameter the same for all intrinsics in the group?
+  bool Constant = true;
+  auto it = kv.second.begin();

Can this use (something like):
bool Constant = llvm::all_of(kv.second, [&](const auto &OI) { return 
OI.ParamValues[i] == kv.second[0].ParamValues[i]; } ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-07 Thread Dave Green via Phabricator via cfe-commits
dmgreen added subscribers: samparker, SjoerdMeijer.
dmgreen added a comment.

This is looking good to me. My understanding is that is has some dependencies? 
The llvm side will likely needed to go in first, plus a couple of clang patches?




Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8529
   "argument should be a multiple of %0">;
+def err_argument_not_power_of_2 : Error<
+  "argument should be a power of 2">;

simon_tatham wrote:
> dmgreen wrote:
> > Do we have any/should we have some tests for these errors?
> By the time we finish implementing all the intrinsics, there will be tests 
> for these errors. The intrinsics that need them aren't in this preliminary 
> commit, though.
OK. That's fair.



Comment at: clang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c:2
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -O3 -S -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang --target=arm-arm-none-eabi -march=armv8.1m.main+mve.fp 
-mfloat-abi=hard -DPOLYMORPHIC -O3 -S -emit-llvm -o - %s | FileCheck %s

simon_tatham wrote:
> dmgreen wrote:
> > These tests all run -O3, the entire pass pipeline. I see quite a few tests 
> > in the same folder do the same thing, but in this case we will be adding 
> > quite a few tests. Random mid-end optimizations may well keep on altering 
> > them.
> > 
> > Is it possible to use -disable-O0-optnone and pipe them into opt -mem2reg? 
> > What would that do to the codegen, would it be a lot more verbose than it 
> > is now?
> > 
> > Also could/should they be using clang_cc1?
> The immediate problem is that if you use any form of `clang | opt | 
> FileCheck` command line, then `update_cc_test_checks.py` says 'skipping 
> non-FileChecked line', because it doesn't support anything more complicated 
> than a two-stage pipeline consisting of clang and FileCheck.
> 
> I've enhanced `update_cc_test_checks` to handle that case, in D68406, and 
> respun these tests accordingly. But if that patch doesn't get approval then 
> I'll have to rethink this again.
> 
> (For `vld24.c` I ended up running `opt -sroa -early-cse` to avoid the IR 
> becoming noticeably more verbose. The rest worked fine with just `mem2reg`, 
> though.)
> 
> 
> Patching `update_cc_test_checks.py` to support more complex pipelines, it 
> seems to work OK: most codegen changes are trivial ones, such as modifiers 
> not appearing on IR operations (`call` instead of `tail call`, plain `shl` in 
> place of `shl nuw`). Only `vld24` becomes significantly more complicated: for 
> that one file I had to run `opt -mem2reg -sroa -early-cse` instead.
Yeah, they look OK. Thanks for making the change. It looks like a useful 
feature being added.



Comment at: clang/test/CodeGen/arm-mve-intrinsics/vcvt.c:13
+{
+return vcvttq_f16_f32(a, b);
+}

Tests for bottom too would be good to see too. In general, more tests are 
better.



Comment at: clang/test/CodeGen/arm-mve-intrinsics/vcvt.c:20
+// CHECK-NEXT:[[TMP1:%.*]] = call <4 x i1> @llvm.arm.mve.pred.i2v.v4i1(i32 
[[TMP0]])
+// CHECK-NEXT:[[TMP2:%.*]] = call <8 x half> 
@llvm.arm.mve.fltnarrow.predicated(<8 x half> [[A:%.*]], <4 x float> [[B:%.*]], 
i32 1, <4 x i1> [[TMP1]])
+// CHECK-NEXT:ret <8 x half> [[TMP2]]

Are these tests still using old names? I'm not missing something about how 
these are generated, am I?



Comment at: clang/utils/TableGen/MveEmitter.cpp:132
+// the pointee type.
+Pointer,
+  };

simon_tatham wrote:
> dmgreen wrote:
> > The gathers are really a Vector of Pointers, in a way. But in the 
> > intrinsics (and IR), they are just treated as a vector of ints, so I 
> > presume a new type is not needed? We may (but not yet) want to make use of 
> > the llvm masked gathers. We would have to add codegen support for them 
> > first though (which I don't think we have plans for in the near term).
> Yes, I'm working so far on the assumption that we don't need to represent 
> scatter/gather address vectors as a special vector-of-pointers type.
> 
> (If nothing else, it would be a bit strange for the 64-bit versions, where 
> the vector element isn't even the same //size// as a pointer.)
> 
> If auto-generation of gather loads during vectorization turns out to need a 
> special representation, then I suppose we'll have to rethink.
Sounds good.



Comment at: clang/utils/TableGen/MveEmitter.cpp:207
+// pointer, especially if the pointee is also const.
+if (isa(Pointee)) {
+  if (Const)

simon_tatham wrote:
> dmgreen wrote:
> > Would making this always east const be simpler? Do we generate anything 
> > with inner pointer types? Should there be a whitespace before the "const " 
> > in Name += "const "?
> There shouldn'

[PATCH] D67160: [clang, ARM] Default to -fno-lax-vector-conversions in ARM v8.1-M.

2019-10-09 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Yes.. My understanding is that the default is still 
-flax-vector-convertions=all (the old clang behaviour), but the plan is to 
change it (for all architectures) to none.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67160/new/

https://reviews.llvm.org/D67160



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67160: [clang, ARM] Default to -fno-lax-vector-conversions in ARM v8.1-M.

2019-10-10 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a subscriber: rsmith.
dmgreen accepted this revision.
dmgreen added a comment.

Yeah, OK. D67678  is the patch that will 
change the default, but only to "int", not to "none" for the moment.

An earlier version of that patch had a different way of setting the default, in 
lib/Basic/Targets/ARM.cpp. I'm not sure that's better or worse than this, they 
both look fine to me.

Under that condition, LGTM!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67160/new/

https://reviews.llvm.org/D67160



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D68877: [AArch64][SVE] Implement masked load intrinsics

2019-10-13 Thread Dave Green via Phabricator via cfe-commits
dmgreen added subscribers: samparker, dmgreen.
dmgreen added a comment.

Sam has been looking at extending masked loads and stores in D68337 
 and related patches. There looks like there 
would be some overlap with this, especially in the target independent parts. 
Make sure you co-ordinate with him.




Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:10393
+  ((!LegalOperations && !cast(N0)->isVolatile()) ||
+   TLI.isLoadExtLegal(ISD::SEXTLOAD, VT, EVT))) {
+MaskedLoadSDNode *LN0 = cast(N0);

I'm not convinced that just because a sext load is legal and a masked load is 
legal, that a sext masked load is always legal.



Comment at: llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td:1077
+
+def _default_z : Pat<(Ty (Load  GPR64:$base, (PredTy PPR:$gp), 
(SVEUndef))),
+ (RegImmInst PPR:$gp, GPR64:$base, (i64 0))>;

What if the passthru isn't undef?



Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h:151
+  bool isLegalMaskedLoad(Type *DataType) {
+return ST->hasSVE();
+  }

This can handle all masked loads? Of any type, extended into any other type, 
with any alignment?



Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h:153
+  }
+  bool isLegalMaskedStore(Type *DataType) {
+return ST->hasSVE();

This patch doesn't handle stores yet.



Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:296
 
+def SVEUndef : ComplexPattern;
+

Can this just use "undef"?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68877/new/

https://reviews.llvm.org/D68877



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69025: [Driver,ARM] Make -mfloat-abi=soft turn off MVE.

2019-10-16 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

I do not know this code super well. What happens to the preprocessor in such 
cases? Does it disable the relevant macros automatically because we've turned 
off the mve and mve.fp features?

Could you add a test?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69025/new/

https://reviews.llvm.org/D69025



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69025: [Driver,ARM] Make -mfloat-abi=soft turn off MVE.

2019-10-16 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Yeah, OK. That makes sense.

LGTM!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69025/new/

https://reviews.llvm.org/D69025



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D68877: [AArch64][SVE] Implement masked load intrinsics

2019-10-22 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Thanks!

It looks like the only supported parameter of the PassThru here is a splat of 0 
or undef. This might get in the way of IR level optimisation that could try to 
producing a masked load with different passthru, which would then fail to 
select. The ARM backed added a custom lowering to turn this case into a select 
of the loaded value and the passthru. Like this, which was further modified in 
later commits:
https://github.com/llvm/llvm-project/commit/b325c057322ce14b5c561d8ac49508adab7649e5#diff-b853ecd5d5e0f73c2c093ffb5b4f2fbbR8855
That way you only have to check on the zero, not undef. I'm not sure if there 
is support yet for vector selects in the SVE codegen?

I was also expecting something that says "setOperationAction(ISD::MLOAD, VT, 
Legal)" somewhere, but I guess that's already the default?




Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:4753
+  let hasSideEffects = 1, hasNoSchedulingInfo = 1, mayLoad = 1 in {
+  def "" : Pseudo<(outs listty:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, 
simm4s1:$imm4), []>,
+   PseudoInstExpansion<(!cast(NAME # _REAL) listty:$Zt, 
PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4)>;

Can you explain why is this pseudo is needed, exactly? I feel that using 
pseudos is often the wrong solution to a problem (it may be required here, I'm 
just sure why exactly).

We currently seem to generate ld1b (for example), over ldnf1b. Is there ever a 
time that we expect to generate a nf load?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68877/new/

https://reviews.llvm.org/D68877



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-23 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

I've read this through again and it looks good. If no one else has any issues, 
then LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D68877: [AArch64][SVE] Implement masked load intrinsics

2019-10-23 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

In D68877#1718729 , @kmclaughlin wrote:

> There is not yet support for vector selects, so for this patch the intention 
> was that any passthru which is not all zero or undef would result in a 
> selection failure.
>  Do you think it would acceptable to handle different passthrus in a future 
> patch which also implements vector selects for SVE?


Yeah OK. That makes sense. I remember the same chicken and egg from MVE. We 
ended up doing them the other way around there. Can you try and add a FIXME 
comment somewhere?

Otherwise LGTM.




Comment at: llvm/lib/Target/AArch64/SVEInstrFormats.td:4753
+  let hasSideEffects = 1, hasNoSchedulingInfo = 1, mayLoad = 1 in {
+  def "" : Pseudo<(outs listty:$Zt), (ins PPR3bAny:$Pg, GPR64sp:$Rn, 
simm4s1:$imm4), []>,
+   PseudoInstExpansion<(!cast(NAME # _REAL) listty:$Zt, 
PPR3bAny:$Pg, GPR64sp:$Rn, simm4s1:$imm4)>;

kmclaughlin wrote:
> dmgreen wrote:
> > Can you explain why is this pseudo is needed, exactly? I feel that using 
> > pseudos is often the wrong solution to a problem (it may be required here, 
> > I'm just sure why exactly).
> > 
> > We currently seem to generate ld1b (for example), over ldnf1b. Is there 
> > ever a time that we expect to generate a nf load?
> The pseudo was a workaround that was added downstream for non-faulting loads, 
> but it is not needed here.
Righteo. We know where to find it if it turns out we do need them.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68877/new/

https://reviews.llvm.org/D68877



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69378: [AArch64][SVE] Implement masked store intrinsics

2019-10-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Looks good to me. (Stores are easier than loads)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69378/new/

https://reviews.llvm.org/D69378



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Hmm.. Let me take a look. There's a different error on the same build now, but 
I think it's just hiding this one.

I'll also try and fix the tests that are failing in places too, if I can.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D67161: [clang,ARM] Initial ACLE intrinsics for MVE.

2019-10-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

I've hopefully fixed the build in rG7b3de1e81197 
, but it's 
hard to tell for sure with the other error.

I also fixed the tests in rG78700ef8866d 
 (and 
worked out how to remove change-id's from commit messages).

Let us know if anything else looks broken.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67161/new/

https://reviews.llvm.org/D67161



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69426: [clang] Switch arm-mve-intrinsics tests to use %clang_cc1.

2019-10-25 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Sounds good.

But make sure that  -fno-discard-value-names is still included. Otherwise the 
names like "entry" will not be output, so the tests fail in "non-assert" builds.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69426/new/

https://reviews.llvm.org/D69426



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D69426: [clang] Switch arm-mve-intrinsics tests to use %clang_cc1.

2019-10-25 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Ah, I see, that's why it was needed. That makes sense.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69426/new/

https://reviews.llvm.org/D69426



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D79710: [clang][BFloat] add create/set/get/dup intrinsics

2020-05-29 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: clang/include/clang/Basic/arm_neon.td:1860
+
+  def VGET_HIGH_BF : NoTestOpInst<"vget_high", ".Q", "b", OP_HI>;
+  def VGET_LOW_BF  : NoTestOpInst<"vget_low", ".Q", "b", OP_LO>;

Do you know what InstName = "vmov" does, and is it needed here?

I'm pretty sure it's a vestige of an earlier implementation of the neon emitter.



Comment at: clang/include/clang/Basic/arm_neon.td:1867
+  def SCALAR_VDUP_LANE_BF : IInst<"vdup_lane", "1.I", "Sb">;
+  def SCALAR_VDUP_LANEQ_BF : IInst<"vdup_laneq", "1QI", "Sb">;
+}

Does this need let isLaneQ = 1, like the other vdup_laneq's?



Comment at: clang/include/clang/Basic/arm_neon_incl.td:293
+
+  string CartesianProductWith = "";
 }

Is this needed in this patch?



Comment at: clang/lib/CodeGen/CGBuiltin.cpp:6309
+  case NEON::BI__builtin_neon_vget_lane_bf16:
+  case NEON::BI__builtin_neon_vduph_lane_bf16:
   case NEON::BI__builtin_neon_vgetq_lane_i8:

How come these are needed for vduph_lane_bf16 if they were not needed for 
vduph_lane_f16?



Comment at: clang/test/CodeGen/aarch64-bf16-getset-intrinsics.c:2
+// RUN: %clang_cc1 -triple aarch64-arm-none-eabi -target-feature +neon 
-target-feature +bf16 \
+// RUN:  -O2 -emit-llvm %s -o - | FileCheck %s --check-prefixes=CHECK,CHECK64
+// RUN: %clang_cc1 -triple armv8.6a-arm-none-eabi -target-feature +neon 
-target-feature +bf16 -mfloat-abi hard \

It's best to have auto generated tests if we can, and ideally not rely on 
running the entire -O2 pipeline. I think a lot of the other tests use clang ... 
-disable-O0-optnone | opt -S -mem2reg | Filecheck ...

Also this aarch64 tests is running arm codegen too? Not sure if that matters, 
but I don't immediately see it being done elsewhere.



Comment at: clang/test/CodeGen/aarch64-bf16-getset-intrinsics.c:19-20
+// CHECK-LABEL: test_vdup_n_bf16
+// CHECK64: %vecinit.i = insertelement <4 x bfloat> undef, bfloat %v, i32 0
+// CHECK32: %vecinit.i = insertelement <4 x bfloat> undef, bfloat %v, i32 0
+// CHECK: %vecinit{{.*}} = shufflevector <4 x bfloat> %vecinit.i, <4 x bfloat> 
undef, <4 x i32> zeroinitializer

A lot of these 32 and 64 bit lines look the same.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79710/new/

https://reviews.llvm.org/D79710



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D81728: [InstCombine] Add target-specific inst combining

2020-07-21 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/test/CodeGen/Thumb2/mve-intrinsics/predicates.ll:2
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: opt -instcombine %s | llc -mtriple=thumbv8.1m.main -mattr=+mve.fp 
-verify-machineinstrs -o - | FileCheck %s
+; RUN: opt -instcombine -mtriple=arm %s | llc -mtriple=thumbv8.1m.main 
-mattr=+mve.fp -verify-machineinstrs -o - | FileCheck %s
 

Please use the same triple as llc for any test with "mve" in the title.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81728/new/

https://reviews.llvm.org/D81728



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D81728: [InstCombine] Add target-specific inst combining

2020-07-21 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81728/new/

https://reviews.llvm.org/D81728



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-15 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8055
 multiclass SIMDFPIndexedTiedPatterns {
+  let Predicates = [HasNEON, HasFullFP16] in {
+  // 1 variant for the .8h version: DUPLANE from 128-bit

Should we have equal patterns to those below for f32 as well? So using DUP, D 
vector (4xf16) and possibly from a vector_extract too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75169: [ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend

2020-06-09 Thread Dave Green via Phabricator via cfe-commits
dmgreen added reviewers: efriedma, ostannard, SjoerdMeijer.
dmgreen added a comment.

Love it. This will help optimise these more too. We can currently get into 
places where we can't prove only the bottom bits of a returned value are 
demanded so can't remove unnecessary vmovs. Test look good too from my 
understanding of how fp16 calling should work.

I unfortunately can't really claim to be an expert on calling conventions code 
or if these new functions are needed. Hopefully someone I added can take a 
look, and if not I can always go find out how it works.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75169/new/

https://reviews.llvm.org/D75169



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82887: [ARM] Add Cortex-A77 Support for Clang and LLVM

2020-07-01 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Thanks. LGTM.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82887/new/

https://reviews.llvm.org/D82887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM

2020-07-08 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

I would expect this to be very similar to 
https://reviews.llvm.org/rG8bf99f1e6f0f9b426d6060361ea6d9d47c1868d1, but some 
parts seems to be missing. Can you make sure that everything is included and in 
a sensible order.




Comment at: llvm/include/llvm/Support/AArch64TargetParser.def:131
+AARCH64_CPU_NAME("cortex-a78", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,
+ (AArch64::AEK_RAS | AArch64::AEK_DOTPROD | AArch64::AEK_RCPC 
| AArch64::AEK_SSBS))
+AARCH64_CPU_NAME("cortex-x1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,

This no longer has FP16?

AEK_RAS I believe should be included in ARMV8_2A,



Comment at: llvm/include/llvm/Support/ARMTargetParser.def:301
+ (ARM::AEK_RAS | ARM::AEK_DOTPROD))
+ARM_CPU_NAME("cortex-a78",ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,
+ (ARM::AEK_RAS | ARM::AEK_DOTPROD))

All these can go in a better order, please. A78 can go next to A77. Same 
everywhere else.



Comment at: llvm/lib/Target/AArch64/AArch64Subtarget.cpp:196
+  case CortexX1:
+  case CortexA78:
+PrefFunctionLogAlignment = 4;

These can go with the other CortexAXX cpu's, which seem to set the same 
PrefFunctionLogAlignment. Same for the ARM equivalent.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83206/new/

https://reviews.llvm.org/D83206



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82574: Merge TableGen files used for clang options

2020-07-09 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Hello

Did you mean to remove CC1Options.td too? Otherwise it does not appear to be 
used any more or this has duplicated the contents.

Also can you make sure the new content of Options.td reflects the latest 
version of CC1Options.td, and that no extra changes have been made since this 
was created.

Thanks


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82574/new/

https://reviews.llvm.org/D82574



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM

2020-07-10 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/include/llvm/Support/AArch64TargetParser.def:135
+ (AArch64::AEK_FP16 | AArch64::AEK_DOTPROD | AArch64::AEK_RCPC 
|
+  AArch64::AEK_SSBS | AArch64::AEK_RAS))
 AARCH64_CPU_NAME("neoverse-e1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,

AEK_RAS will be included in ARMV8_2A, I believe.



Comment at: llvm/include/llvm/Support/ARMTargetParser.def:298
+ARM_CPU_NAME("cortex-a78",ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,
+ (ARM::AEK_RAS | ARM::AEK_DOTPROD))
+ARM_CPU_NAME("cortex-x1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,

This one too I think.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83206/new/

https://reviews.llvm.org/D83206



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D83206: [PATCH] [ARM] Add Cortex-A78 and Cortex-X1 Support for Clang and LLVM

2020-07-10 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Thanks. LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83206/new/

https://reviews.llvm.org/D83206



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D82887: [ARM] Add Cortex-A77 Support for Clang and LLVM

2020-06-30 Thread Dave Green via Phabricator via cfe-commits
dmgreen added reviewers: dmgreen, ostannard, SjoerdMeijer.
dmgreen added a comment.

Please make sure the switch in AArch64Subtarget::initializeProperties has an 
entry too.

Do you happen to know the cpu id for host.cpp?




Comment at: clang/test/Driver/aarch64-cpus.c:729
 
+// RUN: %clang -target aarch64 -mcpu=cortex-a77  -### -c %s 2>&1 | FileCheck 
-check-prefix=CORTEX-A77 %s
+// CORTEX-A77: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" 
"cortex-a77"

Can you move this test up to next to the other cpus.



Comment at: llvm/include/llvm/Support/AArch64TargetParser.def:128
+AARCH64_CPU_NAME("cortex-a77", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,
+ (AArch64::AEK_FP16 | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD 
| AArch64::AEK_SSBS))
 AARCH64_CPU_NAME("neoverse-e1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,

Format this line - it's getting a bit long. It also would be good to make it 
consistent with the lines above.



Comment at: llvm/include/llvm/Support/ARMTargetParser.def:296
+ARM_CPU_NAME("cortex-a77", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,
+(ARM::AEK_FP16 | ARM::AEK_DOTPROD) )
 ARM_CPU_NAME("neoverse-n1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,

Little bit of extra whitespace



Comment at: llvm/lib/Target/AArch64/AArch64.td:954
 def : ProcessorModel<"neoverse-n1", CortexA57Model, [ProcNeoverseN1]>;
+def : ProcessorModel<"cortex-a77", CortexA57Model, [HasV8_2aOps,
+FeatureFPARMv8,

Please add a ProcA77 and sort this next to the A76



Comment at: llvm/lib/Target/ARM/ARM.td:1226
  FeatureDotProd]>;
+def : ProcNoItin<"cortex-a77",  [ARMv82a,
+ FeatureHWDivThumb,

Whitespace here. And Adding a ProcA77 for consistency?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82887/new/

https://reviews.llvm.org/D82887



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-18 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8055
 multiclass SIMDFPIndexedTiedPatterns {
+  let Predicates = [HasNEON, HasFullFP16] in {
+  // 1 variant for the .8h version: DUPLANE from 128-bit

ilinpv wrote:
> dmgreen wrote:
> > Should we have equal patterns to those below for f32 as well? So using DUP, 
> > D vector (4xf16) and possibly from a vector_extract too.
> I'm worried about performance impact of change fmadd/sub -> fmla/ls in last 
> pattern case.
What performance impact are you worried about?



Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8077
+
+  def : Pat<(f16 (OpNode (f16 FPR16:$Rd), (f16 FPR16:$Rn),
+ (vector_extract (v8f16 V128:$Rm), 
VectorIndexS:$idx))),

Do you mean the v4f16 variant of this pattern?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D78252: [AArch64] FMLA/FMLS patterns improvement.

2020-04-21 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks




Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:8094
 V128:$Rm, VectorIndexS:$idx)>;
-  def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),
- (vector_extract (v2f32 V64:$Rm), VectorIndexS:$idx))),

I was a little surprised when you said we could remove these, but it looks like 
the vector_extract (v2f32) is always converted to a vector_extract (v4f32 
insert_subvector (v2f32)). So I agree, seems Ok to remove. (And if we do run 
into a problem, we can always add it back in).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78252/new/

https://reviews.llvm.org/D78252



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77872: [AArch32] Armv8.6-a Matrix Mult Assembly + Intrinsics

2020-04-25 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/lib/Target/ARM/ARMInstrNEON.td:4846
+  VDOT {
+  let hasNoSchedulingInfo = 1;
+

I don't think that hasNoSchedulingInfo is necessarily the best way to handle 
this. That flag is intended for instructions that will never be scheduled, like 
Pseudo instructions.

If you are running into "Complete Schedule" problems, they might need 
HasMatMulInt8 added to the list of unsupported features instead.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77872/new/

https://reviews.llvm.org/D77872



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77872: [AArch32] Armv8.6-a Matrix Mult Assembly + Intrinsics

2020-04-26 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.



Comment at: llvm/lib/Target/ARM/ARMInstrNEON.td:4846
+  VDOT {
+  let hasNoSchedulingInfo = 1;
+

LukeGeeson wrote:
> dmgreen wrote:
> > I don't think that hasNoSchedulingInfo is necessarily the best way to 
> > handle this. That flag is intended for instructions that will never be 
> > scheduled, like Pseudo instructions.
> > 
> > If you are running into "Complete Schedule" problems, they might need 
> > HasMatMulInt8 added to the list of unsupported features instead.
> Since there are no 8.6a cpus in llvm that support this extension, the default 
> behaviour is to use Cortex-A57 scheduling - this also has no notion of 8.6a 
> matmul.
> 
> This falls back to SchedMachineModel in 
> `llvm/lib/Target/ARM/ARMScheduleA57.td` and in particular UnsupportedFeatures 
> would be a candidate place to put unsupported features like matmul. I took 
> issue with putting all new extensions here because there should be a 
> separation of concerns between a particular scheduling model, and supporting 
> new behaviour (which could go in a generic catch all location that might be 
> slightly more informative ).
> 
> Did you have something in particular in mind?
The ARMSchedule A57schedule is marked with CompleteModel=1. I believe it's the 
only one ARM-side that does that, but that's why it's running into problems. 
I'm surprised it hasn't run into the same problems in the past.

On the AArch64 side we tend to mark all Neon instructions with WriteV, so they 
at least get basic scheduling info. We don't tend to do that for ARM though.

Marking the instruction as hasNoSchedulingInfo will stop _any_ schedule from 
setting information on them, which whilst technically OK for now would be cause 
problems in the long run. We should either be adding a Sched[] Write (but I'm 
not sure what we would add), marking A57 as not complete (which seems like a 
hack) or adding unsupported features. Or adding scheduling info for these 
instructions to A57 I guess, but that doesn't sound right for a CPU that 
doesn't support them!

I can put a patch together to clean it up pretty easily.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77872/new/

https://reviews.llvm.org/D77872



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D85575: [ARM] Speed up arm-cortex-cpus.c test

2020-08-08 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Yeah, nice. Sounds good to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85575/new/

https://reviews.llvm.org/D85575

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D155688: [PATCH] [llvm] [InstCombine] Canonicalise ADD+GEP

2023-11-22 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

I hadn't realized this came from someone at Arm. The performance results I had 
were overall roughly flat, with some improvements and regressions. I think 
there were still some people working through some fixes for some of the 
knock-on effects but with those nothing large would stick out in what I saw.

I would expect Loop Strength Reduction (maybe with CGP) to be able to optimize 
the addressing modes back to something that is optimal for the loop if it can. 
It's not always super reliable though. Might there be something going wrong in 
that pass?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155688/new/

https://reviews.llvm.org/D155688

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128415: [ARM] Add Support for Cortex-M85

2022-07-04 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Other than the release note change it might be worth adding some tests for 
-mcpu=cortex-m85+nopacbti and related configurations.

Otherwise LGTM




Comment at: clang/docs/ReleaseNotes.rst:541
+- clang now supports the Cortex-M85 CPU, which can be chosen with
+  `-mcpu=cortex-m85`. By default, this has PACBTI turned off, but it can be
+  enabled with `-mcpu=cortex-m85+pacbti`.

This can be updated now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128415/new/

https://reviews.llvm.org/D128415

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D119720: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum

2022-03-31 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:145
+  case ARM::VMVNq:
+return CondCodeIsAL(3);
+  // VMOV of 64-bit value between D registers (when condition = al)

Can/should all these use findFirstPredOperandIdx?

And is it worth checking for more instruction? Anything that sets a Q register, 
or is that too broad?



Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:250
+// Early return if no instructions are the start of an AES Pair.
+if (!llvm::any_of(MBB.instrs(), isFirstAESPairInstr))
+  continue;

This needn't scan through checking for the instruction that the loop below is 
going to check for too.



Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:306
+  // fixup at the start of the function.
+  LLVM_DEBUG(dbgs()
+ << "Fixup Planned: Live-In (with safe defining instrs): "

Can you explain more about the IsLiveIn && UnsafeCount==0 case. Am I 
understanding that correctly that it would be:
```
function(q0, ...)
  lotsofcode...
  q0 = load
  aes q0
```
Is there a better way to detect that the live-in doesn't matter in cases like 
that?



Comment at: llvm/lib/Target/ARM/ARMTargetMachine.cpp:585
 
+  addPass(createARMFixCortexA57AES1742098Pass());
+

Passes can't insert new instructions (or move things further apart) after 
ConstantIslandPass. The branches/constant pools it has created may go out of 
range of the instructions that use them. Would it be OK to move it before that?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119720/new/

https://reviews.llvm.org/D119720

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D119720: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum

2022-03-31 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:200-207
+  case ARM::VMOVv2f32:
+  case ARM::VMOVv4f32:
+  case ARM::VMOVv2i32:
+  case ARM::VMOVv4i32:
+  case ARM::VMOVv4i16:
+  case ARM::VMOVv8i16:
+  case ARM::VMOVv8i8:

Are these vmov of an immediate? Are they not safe?

I was expecting it to be the lanes sets (VSETLNi8) and other scalar 
instructions that were unsafe.



Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:306
+  // fixup at the start of the function.
+  LLVM_DEBUG(dbgs()
+ << "Fixup Planned: Live-In (with safe defining instrs): "

lenary wrote:
> dmgreen wrote:
> > Can you explain more about the IsLiveIn && UnsafeCount==0 case. Am I 
> > understanding that correctly that it would be:
> > ```
> > function(q0, ...)
> >   lotsofcode...
> >   q0 = load
> >   aes q0
> > ```
> > Is there a better way to detect that the live-in doesn't matter in cases 
> > like that?
> I don't believe there is, and this comes down to issues with the 
> `RDA.getGlobalReachingDefs` implementation, which I want to fix/enhance, but 
> in a follow-up patch.
> 
> To start with, this is actually not a problem, as the pass is intended to be 
> conservative, and we know the clobbers are a no-op at the architectural level 
> (we insert them for their micro-architectural effects). So code will still do 
> the right thing, but maybe with a little too much overhead in the case you 
> showed.
> 
> However, this is necessary in some other cases, such as:
> 
> ```
> function(q0)
>code
>conditional branch to L2
> L1:
>q0 = safe_op(…)
>branch to L3
> L2:
>code without update to q0
> L3:
>aes q0
> ```
> 
> In this case, `AllDefs` is a set containing one single defining instruction 
> for Q0, because there is only one within the function (which is all that 
> `RDA.getGlobalReachingDefs` can report instructions for).
> 
> But in my example, we *need* to protect q0 on the other paths, because the 
> safe definitions of q0, when considered as a set, do not entirely dominate 
> the AES use of q0 (this is slightly stretching the conventional definition of 
> dominance, but think of this as "there exists a path from entry to the aes, 
> which does not contain any of the safe instructions". Sadly, MachineDominance 
> doesn't allow us to make this kind of query either!).
> 
> In this case though, it is safe to insert the protection at function entry, 
> because that will (by definition) dominate all the AES uses, and the 
> protection doesn't need to be dominated by the safe definitions, as we know 
> they're safe.
> 
> I intend to follow-up this initial patch with an enhancement to 
> ReachingDefAnalysis which will also provide the information that you have a 
> set of defs inside the function, and also you're live-in, as this is required 
> info for any conservative pass using the ReachingDefAnalysis. I felt, 
> however, that given the pass is safe as-is, it was good to proceed without 
> this enhancement.
> 
> 
> 
> 
OK sounds good.



Comment at: llvm/lib/Target/ARM/ARMTargetMachine.cpp:585
 
+  addPass(createARMFixCortexA57AES1742098Pass());
+

lenary wrote:
> dmgreen wrote:
> > Passes can't insert new instructions (or move things further apart) after 
> > ConstantIslandPass. The branches/constant pools it has created may go out 
> > of range of the instructions that use them. Would it be OK to move it 
> > before that?
> TIL. I'll add a comment about the constant island pass as well.
> 
> Should I also look at the restrictions on the Branch Targets pass? I can 
> imagine we also don't want to separate instructions once we've calculated 
> their targets locally?
Yeah - It sounds like the BTI would need to remain as the first instruction in 
the block.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119720/new/

https://reviews.llvm.org/D119720

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D130973: [AArch64] Always allow the __bf16 type

2022-08-04 Thread Dave Green via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG8c30f4a5ab3e: [AArch64] Always allow the __bf16 type 
(authored by dmgreen).
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130973/new/

https://reviews.llvm.org/D130973

Files:
  clang/docs/LanguageExtensions.rst
  clang/lib/Basic/Targets/AArch64.cpp
  clang/lib/Basic/Targets/AArch64.h
  clang/test/CodeGen/arm-bf16-params-returns.c
  clang/test/CodeGen/arm-mangle-bf16.cpp
  clang/test/Sema/arm-bf16-forbidden-ops.c
  clang/test/Sema/arm-bf16-forbidden-ops.cpp
  clang/test/Sema/arm-bfloat.cpp

Index: clang/test/Sema/arm-bfloat.cpp
===
--- clang/test/Sema/arm-bfloat.cpp
+++ clang/test/Sema/arm-bfloat.cpp
@@ -1,49 +1,57 @@
-// RUN: %clang_cc1 -fsyntax-only -verify -std=c++11 \
-// RUN: -triple aarch64-arm-none-eabi -target-cpu cortex-a75 \
-// RUN: -target-feature +bf16 -target-feature +neon %s
-// RUN: %clang_cc1 -fsyntax-only -verify -std=c++11 \
-// RUN: -triple arm-arm-none-eabi -target-cpu cortex-a53 \
-// RUN: -target-feature +bf16 -target-feature +neon %s
+// RUN: %clang_cc1 -fsyntax-only -verify=scalar,neon -std=c++11 \
+// RUN:   -triple aarch64-arm-none-eabi -target-cpu cortex-a75 \
+// RUN:   -target-feature +bf16 -target-feature +neon %s
+// RUN: %clang_cc1 -fsyntax-only -verify=scalar,neon -std=c++11 \
+// RUN:   -triple arm-arm-none-eabi -target-cpu cortex-a53 \
+// RUN:   -target-feature +bf16 -target-feature +neon %s
+
+// The types should be available under AArch64 even without the bf16 feature
+// RUN: %clang_cc1 -fsyntax-only -verify=scalar -DNONEON -std=c++11 \
+// RUN:   -triple aarch64-arm-none-eabi -target-cpu cortex-a75 \
+// RUN:   -target-feature -bf16 -target-feature +neon %s
 
 // REQUIRES: aarch64-registered-target || arm-registered-target
 
 void test(bool b) {
   __bf16 bf16;
 
-  bf16 + bf16; // expected-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
-  bf16 - bf16; // expected-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
-  bf16 * bf16; // expected-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
-  bf16 / bf16; // expected-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
+  bf16 + bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
+  bf16 - bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
+  bf16 * bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
+  bf16 / bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
 
   __fp16 fp16;
 
-  bf16 + fp16; // expected-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
-  fp16 + bf16; // expected-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
-  bf16 - fp16; // expected-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
-  fp16 - bf16; // expected-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
-  bf16 * fp16; // expected-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
-  fp16 * bf16; // expected-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
-  bf16 / fp16; // expected-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
-  fp16 / bf16; // expected-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
-  bf16 = fp16; // expected-error {{assigning to '__bf16' from incompatible type '__fp16'}}
-  fp16 = bf16; // expected-error {{assigning to '__fp16' from incompatible type '__bf16'}}
-  bf16 + (b ? fp16 : bf16); // expected-error {{incompatible operand types ('__fp16' and '__bf16')}}
+  bf16 + fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
+  fp16 + bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
+  bf16 - fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
+  fp16 - bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
+  bf16 * fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
+  fp16 * bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
+  bf16 / fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
+  fp16 / bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
+  bf16 = fp16; // scalar-error {{assigning to '__bf16' from incompatible type '__fp16'}}
+  fp16 = bf16; // scalar-error {{assigning to '__fp16' from incompatible type '__bf16'}}
+  bf16 + (b ? fp16 : bf16); // scalar-error {{incompatible operand typ

[PATCH] D127812: [AArch64] Function multiversioning support added.

2022-08-08 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/include/llvm/Support/AArch64TargetParser.def:188
+AARCH64_CPU_FEATURE("sve_aes",SVE_AES,   "+sve2-aes", 330)
+AARCH64_CPU_FEATURE("sve_pmull128",   SVE_PMULL128,  "",  340)
+AARCH64_CPU_FEATURE("sve_bitperm",SVE_BITPERM,   "+sve2-bitperm", 350)

The compiler already has names for these, used in the existing -march=... 
options and target(...) attributes. It would be better to not have to invent a 
new set of names that is slightly different.

It would be good if this could be combined with the table above too. Hopefully 
it is just an expansion to the values already present.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127812/new/

https://reviews.llvm.org/D127812

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D120527: [OpaquePtr][AArch64] Use elementtype on ldxr/stxr

2022-03-10 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Seems OK. Thanks for the patch.

Do opaque pointer variants (like `i32 @llvm.aarch64.stxr.p0(i64 1, ptr 
elementtype(i64) %ptr.0)`) get tested automatically from the existing tests 
once -opaque-pointers is the default?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120527/new/

https://reviews.llvm.org/D120527

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D121792: [AArch64][SVE] InstCombine llvm.aarch64.sve.sel to select

2022-03-17 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Why do we have llvm.aarch64.sve.sel if we are always going to replace it with a 
select? Why not remove llvm.aarch64.sve.sel entirely?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121792/new/

https://reviews.llvm.org/D121792

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D122046: [clang] Remove Address::deprecated from MveEmitter

2022-03-21 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a reviewer: simon_tatham.
dmgreen added a comment.

> I'm very out of my depth with tablegen, let me know if there's a more elegant 
> way of doing this

I think this sounds OK. Providing all the tests pass it looks OK (but I'm not a 
big expert myself)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122046/new/

https://reviews.llvm.org/D122046

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D119720: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum

2022-04-08 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Looks OK to me, as far as I can see. If it worth adding a few extra 
instructions that may come up?




Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:146
+  case ARM::VMVNd:
+  case ARM::VMVNq:
+  // VMOV of 64-bit value between D registers (when condition = al)

Perhaps add these, if they are safe:
VBICd/q
VBICi's, VORRi's
VBIF/VBIT/VBSL/VBSP
VCEQ/VCNE/etc?
VDUP? VEXT?
VMVN imm equivalents of VMOV's
VREV's?
VSHL's, VSHR's?
I'm not sure if they will be very useful, but they are the kind of instructions 
that may come up in aes algorithms.



Comment at: llvm/lib/Target/ARM/ARMTargetMachine.cpp:588-589
   addPass(createARMBranchTargetsPass());
+  // Inserts Constant Islands. No new instructions may be inserted after this
+  // point, as this will affect the offsets used for accessing these constants.
   addPass(createARMConstantIslandPass());

"No new instructions may be inserted" -> "Block sizes cannot be increased"
And maybe "will affect the offsets used for accessing these constants." -> "may 
push the branch ranges and load offsets of accessing constant pools out of 
range."



Comment at: llvm/lib/Target/ARM/ARMTargetMachine.cpp:591-593
+  // Finalises Low-Overhead Loops. This relies on knowing the final block size,
+  // but can run after constant islands as it does not insert additional
+  // instructions.

It's not about "not inserting instructions" exactly - it will replace psuedos 
with all kinds of new instructions :)
The pseudos needed to have a conservative size through ConstantIslandPass 
though to allow that though. It does make sure that it will not move 
instructions further apart from their targets.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119720/new/

https://reviews.llvm.org/D119720

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115620: [AArch64] Lowering and legalization of strict FP16

2022-04-12 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

I might have misunderstood which this patch was. Can you move the clang test 
into D118259  with the other? It seems like 
the same problem, and it looks like there should be enough llc tests to cover 
all the cases.




Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:1386
+
+  IsStrictFPEnabled = true;
 }

What are the ramifications of setting this to true?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115620/new/

https://reviews.llvm.org/D115620

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D115620: [AArch64] Lowering and legalization of strict FP16

2022-04-14 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

OK Thanks. This LGTM then.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115620/new/

https://reviews.llvm.org/D115620

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D118257: [AArch64] Generate fcmps when appropriate for neon intrinsics

2022-02-03 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

This seems OK. LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118257/new/

https://reviews.llvm.org/D118257

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D118044: [ARM] Undeprecate complex IT blocks

2022-02-04 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

The actual code change looks fine to me, providing we can clean up these test 
changes a bit and no-one else has any other comments.




Comment at: llvm/test/CodeGen/ARM/2013-05-05-IfConvertBug.ll:196
 ; Hard-coded registers comes from the ABI.
 ; CHECK-LABEL: wrapDistance:
 ; CHECK: cmp r1, #59

If you autogenerate check lines we have to make sure we remove the old ones, 
but I'm not sure the new check lines are filling in all the details. They can 
do that if the check lines dont match between multiple run lines with the same 
check-prefix.

It is probably simplest to change the check lines to
; RUN: llc < %s -mtriple=thumbv8 -arm-restrict-it | FileCheck 
-check-prefix=CHECK-V8 %s
; RUN: llc < %s -mtriple=thumbv7 -arm-restrict-it | FileCheck 
-check-prefix=CHECK-V8 %s
And remove all the other differences from this file.



Comment at: llvm/test/CodeGen/ARM/arm-bf16-pcs.ll:190
 ; BASE-THUMB-NEXT:strh.w r0, [sp, #6]
+; BASE-THUMB-NEXT:uxth r1, r0
 ; BASE-THUMB-NEXT:mov r0, r5

john.brawn wrote:
> This, and the cases in other tests where we have a uxth/uxtb that moves, 
> looks rather strange and not something I'd expect given that there's no IT 
> here. Do you know what's going on here?
Given restrictIT we run Thumb2SizeReduce earlier, which can change the 
scheduling.



Comment at: llvm/test/CodeGen/ARM/ifcvt-branch-weight.ll:22
+; CHECK: bb.1.bb2:
+; CHECK: successors: %bb.2(0x4000)
 

MarkMurrayARM wrote:
> dmgreen wrote:
> > I'm not sure this is still checking anything useful. How has the full 
> > output changed? Is there not a block with two outputs anymore?
> It is a failure caused by my change. 
From what I can tell, this test is trying test that the block weights are 
correct after ifcvt. It would be best to make sure that is still being tested.

It's probably best to add -arm-restrict-it to this test, so it can keep testing 
the same thing.



Comment at: llvm/test/CodeGen/ARM/speculation-hardening-sls.ll:38
 ; NOHARDENARM: {{bxgt lr$}}
-; NOHARDENTHUMB:   {{bx lr$}}
 ; ISBDSB-NEXT: dsb sy

dmgreen wrote:
> What happened to this check line? Should it be bxgt lr now?
Change this to `; NOHARDENTHUMB:   {{bxgt lr$}}`
It should then be the only change needed in this file, the other checks will 
fall into place without needing any modification. It looks like different lines 
were matching than what was expected, which threw off the ones below.



Comment at: llvm/test/CodeGen/Thumb2/v8_IT_3.ll:4
 ; RUN: llc < %s -mtriple=thumbv8 -arm-atomic-cfg-tidy=0 -relocation-model=pic 
| FileCheck %s --check-prefix=CHECK-PIC
-; RUN: llc < %s -mtriple=thumbv7 -arm-atomic-cfg-tidy=0 -arm-restrict-it 
-relocation-model=pic | FileCheck %s --check-prefix=CHECK-PIC
+; RUN: llc < %s -mtriple=thumbv7 -arm-atomic-cfg-tidy=0 -arm-restrict-it 
-relocation-model=pic | FileCheck %s --check-prefix=CHECK-PIC-RESTRICT-IT 
--check-prefix=CHECK-RESTRICT-IT
 

MarkMurrayARM wrote:
> dmgreen wrote:
> > I'm not sure what this test is doing. Can you just remove -arm-restrict-it 
> > and update the check lines?
> It's checking restricted ID blocks in position-independent code. I don't see 
> what your request will achieve?
Oh OK. I hadn't realized this test was for restricts it blocks specifically. In 
that case maybe just add -arm-restrict-it to the thumbv8 lines and leave the 
check lines as-is. That would then be the same as v8_IT_5.ll.

Otherwise, this needn't have both CHECK-PIC-RESTRICT-IT and CHECK-RESTRICT-IT, 
one of the two should be fine.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118044/new/

https://reviews.llvm.org/D118044

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D118044: [ARM] Undeprecate complex IT blocks

2022-02-07 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Thanks for the updates. LGTM.




Comment at: llvm/test/CodeGen/ARM/ifcvt-branch-weight.ll:21
 
-; CHECK: bb.2.bb2:
+; CHECK: bb.1.bb:
 ; CHECK: successors: %bb.4(0x4000), %bb.3(0x4000)

I think this should still be bb.2.bb2


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118044/new/

https://reviews.llvm.org/D118044

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D117795: [AArch64] Add some missing strict FP vector lowering

2022-02-07 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Thanks for the changes. LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117795/new/

https://reviews.llvm.org/D117795

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D117112: [AArch64] Support for Ampere1 core

2022-04-28 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

I don't know the details of the scheduling latencies for the core, but this 
looks perfectly sensible. There are a few comments inline, but other than those 
this patch LGTM.




Comment at: llvm/include/llvm/Support/AArch64TargetParser.def:296
+ (AArch64::AEK_FP16 | AArch64::AEK_MTE | AArch64::AEK_SB |
+ AArch64::AEK_SSBS))
 // Invalid CPU

The formatting is a bit off here.



Comment at: llvm/lib/Target/AArch64/AArch64.td:1082
FeatureFullFP16, FeatureFP16FML, 
FeatureDotProd];
+  list Ampere1 = [HasV8_6aOps, FeatureNEON, FeaturePerfMon];
 

Should this include SSBS and MTE too, if they are included in the target 
parsing?



Comment at: llvm/lib/Target/AArch64/AArch64Subtarget.h:83
+TSV110,
+Ampere1
   };

dmgreen wrote:
> This list can be alphabetical.
This still needs to be alphabetical


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117112/new/

https://reviews.llvm.org/D117112

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D117112: [AArch64] Support for Ampere1 core

2022-05-03 Thread Dave Green via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG64816e68f441: [AArch64] Support for Ampere1 core (authored 
by philipp.tomsich, committed by dmgreen).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117112/new/

https://reviews.llvm.org/D117112

Files:
  clang/test/Misc/target-invalid-cpu-note.c
  llvm/include/llvm/Support/AArch64TargetParser.def
  llvm/lib/Target/AArch64/AArch64.td
  llvm/lib/Target/AArch64/AArch64SchedAmpere1.td
  llvm/lib/Target/AArch64/AArch64SchedPredAmpere.td
  llvm/lib/Target/AArch64/AArch64SchedPredicates.td
  llvm/lib/Target/AArch64/AArch64Subtarget.cpp
  llvm/lib/Target/AArch64/AArch64Subtarget.h
  llvm/test/CodeGen/AArch64/cpus.ll
  llvm/test/CodeGen/AArch64/neon-dot-product.ll
  llvm/test/CodeGen/AArch64/remat.ll
  llvm/test/MC/AArch64/armv8.2a-dotprod.s
  llvm/test/MC/AArch64/armv8.3a-rcpc.s
  llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt
  llvm/unittests/Support/TargetParserTest.cpp

Index: llvm/unittests/Support/TargetParserTest.cpp
===
--- llvm/unittests/Support/TargetParserTest.cpp
+++ llvm/unittests/Support/TargetParserTest.cpp
@@ -1195,6 +1195,14 @@
  AArch64::AEK_SVE2 | AArch64::AEK_SVE2BITPERM |
  AArch64::AEK_BF16 | AArch64::AEK_I8MM,
  "8.5-A"),
+ARMCPUTestParams("ampere1", "armv8.6-a", "crypto-neon-fp-armv8",
+ AArch64::AEK_CRC  | AArch64::AEK_FP   | AArch64::AEK_FP16   |
+ AArch64::AEK_SIMD | AArch64::AEK_RAS  | AArch64::AEK_LSE |
+ AArch64::AEK_RDM  | AArch64::AEK_RCPC | AArch64::AEK_DOTPROD |
+ AArch64::AEK_SM4  | AArch64::AEK_SHA3 | AArch64::AEK_BF16|
+ AArch64::AEK_SHA2 | AArch64::AEK_AES  | AArch64::AEK_I8MM|
+ AArch64::AEK_MTE  | AArch64::AEK_SSBS | AArch64::AEK_SB,
+ "8.6-A"),
 ARMCPUTestParams(
 "neoverse-512tvb", "armv8.4-a", "crypto-neon-fp-armv8",
 AArch64::AEK_RAS | AArch64::AEK_SVE | AArch64::AEK_SSBS |
@@ -1256,7 +1264,7 @@
  AArch64::AEK_LSE | AArch64::AEK_RDM,
  "8.2-A")));
 
-static constexpr unsigned NumAArch64CPUArchs = 53;
+static constexpr unsigned NumAArch64CPUArchs = 54;
 
 TEST(TargetParserTest, testAArch64CPUArchList) {
   SmallVector List;
Index: llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt
===
--- llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt
+++ llvm/test/MC/Disassembler/AArch64/armv8.3a-rcpc.txt
@@ -12,6 +12,7 @@
 # RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=neoverse-e1 --disassemble < %s | FileCheck %s
 # RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=neoverse-n1 --disassemble < %s | FileCheck %s
 # RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=neoverse-n2 --disassemble < %s | FileCheck %s
+# RUN: llvm-mc -triple aarch64-none-linux-gnu -mcpu=ampere1 --disassemble < %s | FileCheck %s
 
 # CHECK: ldaprb w0, [x0]
 # CHECK: ldaprh w0, [x0]
Index: llvm/test/MC/AArch64/armv8.3a-rcpc.s
===
--- llvm/test/MC/AArch64/armv8.3a-rcpc.s
+++ llvm/test/MC/AArch64/armv8.3a-rcpc.s
@@ -6,6 +6,7 @@
 // RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mcpu=neoverse-e1 < %s 2>&1 | FileCheck %s
 // RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mcpu=neoverse-n1 < %s 2>&1 | FileCheck %s
 // RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mcpu=neoverse-n2 < %s 2>&1 | FileCheck %s
+// RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mcpu=ampere1 < %s 2>&1 | FileCheck %s
 // RUN: llvm-mc -triple aarch64-none-linux-gnu -show-encoding -mattr=+v8.2a -mattr=+rcpc < %s 2>&1 | FileCheck %s
 // RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.2a < %s 2> %t
 // RUN: FileCheck --check-prefix=CHECK-REQ %s < %t
Index: llvm/test/MC/AArch64/armv8.2a-dotprod.s
===
--- llvm/test/MC/AArch64/armv8.2a-dotprod.s
+++ llvm/test/MC/AArch64/armv8.2a-dotprod.s
@@ -13,6 +13,7 @@
 // RUN: llvm-mc -triple aarch64 -mcpu=tsv110 -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD
 // RUN: llvm-mc -triple aarch64 -mcpu=cortex-r82 -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD
 // RUN: llvm-mc -triple aarch64 -mattr=+v8r -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD
+// RUN: llvm-mc -triple aarch64 -mcpu=ampere1 -show-encoding < %s | FileCheck %s --check-prefix=CHECK-DOTPROD
 
 // RUN: not llvm-mc -triple aarch64 -mattr=+v8.2a -show-encoding < %s 2> %t
 // RUN: FileCheck --check-prefix=CHECK-NO-DOTP

[PATCH] D119720: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum

2022-05-10 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/lib/Target/ARM/ARMFixCortexA57AES1742098Pass.cpp:146
+  case ARM::VMVNd:
+  case ARM::VMVNq:
+  // VMOV of 64-bit value between D registers (when condition = al)

lenary wrote:
> dmgreen wrote:
> > Perhaps add these, if they are safe:
> > VBICd/q
> > VBICi's, VORRi's
> > VBIF/VBIT/VBSL/VBSP
> > VCEQ/VCNE/etc?
> > VDUP? VEXT?
> > VMVN imm equivalents of VMOV's
> > VREV's?
> > VSHL's, VSHR's?
> > I'm not sure if they will be very useful, but they are the kind of 
> > instructions that may come up in aes algorithms.
> I'm very keen to avoid scope-creep on this patch, so I'm going to push back 
> on this comment.
> 
> We know this list as given is safe (and have had internal confirmation). I've 
> sent a new email internally with your list of instructions, to find out of 
> they're safe too, but I'd like any answer to that to be part of a follow-up 
> patch rather than blocking this patch for yet longer.
> 
> I believe what I have today is correct, even if the list is not optimal for 
> all expected AES code sequences.
Yeah that sounds OK. So long as you address Simons comments and follow up with 
the instructions at a later date, this LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119720/new/

https://reviews.llvm.org/D119720

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D119720: [ARM] Pass for Cortex-A57 and Cortex-A72 Fused AES Erratum

2022-05-13 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Thanks. LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119720/new/

https://reviews.llvm.org/D119720

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D125604: [FileCheck] Catch missspelled directives.

2022-05-18 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/test/CodeGen/AArch64/fp16-v8-instructions.ll:682-688
+; TODO-CVT-DAG:   fcvtl   [[LO:v[0-9]+\.4s]], v0.4h
+; TODO-CVT-DAG:   fcvtl2  [[HI:v[0-9]+\.4s]], v0.8h
+; TODO-CVT-DAG:   fcvtzs  [[LOF32:v[0-9]+\.4s]], [[LO]]
+; TODO-CVT-DAG:   xtn [[I16:v[0-9]+]].4h, [[LOF32]]
+; TODO-CVT-DAG:   fcvtzs  [[HIF32:v[0-9]+\.4s]], [[HI]]
+; TODO-CVT-DAG:   xtn2[[I16]].8h, [[HIF32]]
+; TODO-NEXT:  ret

kosarev wrote:
> @az 
These lines should be removed. The were accidentally left in as the file was 
update_llc_test_check'd



Comment at: llvm/test/CodeGen/Thumb2/thumb2-execute-only-prologue.ll:13
 ; CHECK: .LCPI0_0:
-; CHECK_NEXT:long   4294963196
+; TODO-NEXT: long   4294963196
+

kosarev wrote:
> @keith.walker.arm 
I think `; CHECK-NEXT:  .long 4294965696` should be OK. That looks like it 
would match up with the code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125604/new/

https://reviews.llvm.org/D125604

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72270: [ARM,MVE] Fix many signedness errors in MVE intrinsics.

2020-01-06 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: clang/include/clang/Basic/arm_mve.td:204
 let params = T.Float in {
-  defm vminnmq : VectorVectorArithmetic<"min_predicated">;
-  defm vmaxnmq : VectorVectorArithmetic<"max_predicated">;
+  defm vminnmq : VectorVectorArithmetic<"min_predicated", (? (u32 0))>;
+  defm vmaxnmq : VectorVectorArithmetic<"max_predicated", (? (u32 0))>;

What do these 0's mean?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72270/new/

https://reviews.llvm.org/D72270



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72270: [ARM,MVE] Fix many signedness errors in MVE intrinsics.

2020-01-06 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

OK. LGTM then.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72270/new/

https://reviews.llvm.org/D72270



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72268: [ARM,MVE] Support -ve offsets in gather-load intrinsics.

2020-01-06 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

LGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72268/new/

https://reviews.llvm.org/D72268



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72329: [ARM, MVE] Intrinsics for variable shift instructions.

2020-01-08 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

I imagine trying to read what a llvm.arm.mve.vshl means in IR would be quite 
difficult, but it does make lowering them simpler. LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72329/new/

https://reviews.llvm.org/D72329



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72328: [ARM,MVE] Intrinsics for partial-overwrite imm shifts.

2020-01-08 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

LGTM I think




Comment at: clang/utils/TableGen/MveEmitter.cpp:1105
+int Num = Op->getValueAsInt("num"), Denom = Op->getValueAsInt("denom");
+unsigned desiredSize = STKind->sizeInBits() * Num / Denom;
 for (const auto &kv : ScalarTypes) {

DesiredSize


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72328/new/

https://reviews.llvm.org/D72328



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72444: [ARM,MVE] Fix valid immediate range for vsliq_n.

2020-01-09 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Nice test. LGTM




Comment at: clang/test/Sema/arm-mve-immediates.c:182
+
+  vshrq(vb, 0); // expected-error {{argument value 0 is outside the valid 
range [1, 8]}}
+  vshrq(vb, 9); // expected-error {{argument value 9 is outside the valid 
range [1, 8]}}

This set is repeated from above?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72444/new/

https://reviews.llvm.org/D72444



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72496: [ARM,MVE] Make `vqrshrun` generate the right instruction.

2020-01-10 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Yeah, I said these would be tough to read! LGTM


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72496/new/

https://reviews.llvm.org/D72496



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72761: [ARM][MVE][Intrinsics] Add VMINAQ, VMINNMAQ, VMAXAQ, VMAXNMAQ intrinsics.

2020-01-15 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Nice one. Good to see codegen changes coming out of these intrinsics.

It took a while for me to figure out what the integer instruction was doing. 
That's a strange one.

The fp case I have a question about below.




Comment at: llvm/lib/Target/ARM/ARMInstrMVE.td:3658
+// Unpredicated v(max|min)nma
+def : Pat<(VTI.Vec (unpred_op (VTI.Vec MQPR:$Qd), (fabs (VTI.Vec 
MQPR:$Qm,
+  (VTI.Vec (Inst (VTI.Vec MQPR:$Qd), (VTI.Vec MQPR:$Qm)))>;

If I'm reading the ARMARM correctly, the fp case seems to preform the abs on 
both operands.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72761/new/

https://reviews.llvm.org/D72761



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling

2020-09-21 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

> @dmgreen for arm?:)

This would seem more like a good general codegen cleanup than something that 
would be target dependent. It would probably be more dependent on the code that 
is being run, than the exact target. But yeah, I ran some baremetal tests. Only 
one changed (including in all the codesize tests), which was a nasty 
complicated state machine. It changed between -5% and +3.5%, depend on the cpu 
it ran on. (Unfortunately it went down on the cores I was more interested in).

I have run into this exact problem recently and very nearly put up a very 
similar patch for it. In that case it was making intrinsic MVE code much easier 
to write, as you could rely on loops not clogging up stack array uses after 
they were unrolled. The differences were not quite in the 160% range, but they 
would be nice improvements.

So, a little reluctantly, this does sound like a good idea to me. I asked Sanne 
to run Spec too if he has the time.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87972/new/

https://reviews.llvm.org/D87972

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86146: [ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics

2020-08-19 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Do we need to upgrade the old bfmmla to the new signatures?




Comment at: llvm/test/CodeGen/ARM/arm-bf16-dotprod-intrinsics.ll:176
-
-define <4 x float> @test_vbfmlaltq_laneq_f32_v2(<4 x float> %r, <8 x bfloat> 
%a, <8 x bfloat> %b) {
-; CHECK-LABEL: test_vbfmlaltq_laneq_f32_v2:

It seems like it's probably worth keeping this test.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86146/new/

https://reviews.llvm.org/D86146

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86146: [ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics

2020-08-20 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

I mean, do we need upgrade code in llvm/lib/IR/AutoUpgrade.cpp for the new 
intrinsic forms?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86146/new/

https://reviews.llvm.org/D86146

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86146: [ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics

2020-08-27 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Thanks. LGTM.




Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:473
   def int_aarch64_neon_bfdot : AdvSIMD_Dot_Intrinsic;
-  def int_aarch64_neon_bfmmla : AdvSIMD_MatMul_Intrinsic;
-  def int_aarch64_neon_bfmlalb : AdvSIMD_FML_Intrinsic;
-  def int_aarch64_neon_bfmlalt : AdvSIMD_FML_Intrinsic;
+  def int_aarch64_neon_bfmmla
+: Intrinsic<[llvm_v4f32_ty],

This can be a AdvSIMD_BF16FML_Intrinsic?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86146/new/

https://reviews.llvm.org/D86146

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87188: [InstCombine] Canonicalize SPF to abs intrinc

2020-12-18 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Thanks for the reproducer. I verified that it does indeed fail with this patch.

It seems to be doing this as a knock-on effect: https://godbolt.org/z/Y4z3je, 
which does not verify: https://alive2.llvm.org/ce/z/PN7Rv5 ?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87188/new/

https://reviews.llvm.org/D87188

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87188: [InstCombine] Canonicalize SPF to abs intrinc

2020-12-18 Thread Dave Green via Phabricator via cfe-commits
dmgreen added a comment.

Yeah. The reproducer seems to work OK with a patch something like this:

  diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp 
b/llvm/lib/Analysis/InstructionSimplify.cpp
  index 35c21a0..c517286 100644 

  --- a/llvm/lib/Analysis/InstructionSimplify.cpp   

  +++ b/llvm/lib/Analysis/InstructionSimplify.cpp   

  @@ -4020,11 +4020,11 @@ static Value *simplifySelectWithICmpCond(Value 
*CondVal, Value *TrueVal,  


   // X == 0 ? abs(X) : -abs(X) --> -abs(X) 

   // X == 0 ? -abs(X) : abs(X) --> abs(X)  

  -if (match(TrueVal, m_Intrinsic(m_Value(X))) &&   

  -match(FalseVal, m_Neg(m_Intrinsic(m_Specific(X)  

  +if (match(TrueVal, m_Intrinsic(m_Specific(CmpLHS))) &&   

  +match(FalseVal, 
m_Neg(m_Intrinsic(m_Specific(CmpLHS) 
 return FalseVal;   

  -if (match(TrueVal, m_Neg(m_Intrinsic(m_Value(X &&

  -match(FalseVal, m_Intrinsic(m_Specific(X 

  +if (match(TrueVal, 
m_Neg(m_Intrinsic(m_Specific(CmpLHS &&
  +match(FalseVal, m_Intrinsic(m_Specific(CmpLHS

 return FalseVal;   

 }  




(I must admit, was fully expecting this to be something wrong in the AArch64 
backend. I'll leave that fix to you if you are happy to add tests and whatnot.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87188/new/

https://reviews.llvm.org/D87188

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93022: [ARM][AArch64] Add Cortex-A78C Support for Clang and LLVM

2020-12-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/include/llvm/Support/ARMTargetParser.def:306
+ARM_CPU_NAME("cortex-a78c", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,
+ ARM::AEK_RAS)
 ARM_CPU_NAME("cortex-x1", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,

RAS is already a part of ARMV8_2A. I think this should include ARM::AEK_FP16 | 
ARM::AEK_DOTPROD though, like the cortex-a78.



Comment at: llvm/lib/Target/AArch64/AArch64.td:700
+FeatureFMI,
+FeatureFPARMv8,
+FeatureFuseAES,

There are some missing here too from the above A78, like FeatureRCPC and 
FeatureSSBS (and DotProd)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93022/new/

https://reviews.llvm.org/D93022

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93022: [ARM][AArch64] Add Cortex-A78C Support for Clang and LLVM

2020-12-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen added inline comments.



Comment at: llvm/include/llvm/Support/AArch64TargetParser.def:151
+AARCH64_CPU_NAME("cortex-a78c", ARMV8_2A, FK_CRYPTO_NEON_FP_ARMV8, false,
+ (AArch64::AEK_RAS))
 AARCH64_CPU_NAME("cortex-r82", ARMV8R, FK_CRYPTO_NEON_FP_ARMV8, false,

This needs changing too.



Comment at: llvm/lib/Target/ARM/ARM.td:1313
 
+def : ProcNoItin<"cortex-a78c", [ARMv82a,
+ FeatureHWDivThumb,

+ProcA78C



Comment at: llvm/lib/Target/ARM/ARM.td:1318
+ FeatureCRC,
+ FeatureFullFP16]>;
+

DotProd here too I would think.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93022/new/

https://reviews.llvm.org/D93022

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D93022: [ARM][AArch64] Add Cortex-A78C Support for Clang and LLVM

2020-12-24 Thread Dave Green via Phabricator via cfe-commits
dmgreen accepted this revision.
dmgreen added a comment.
This revision is now accepted and ready to land.

Thanks. I think this LGTM now.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93022/new/

https://reviews.llvm.org/D93022

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   >