The vectorizer, for large permuted grouped loads, generates
inefficient intermediate code (cleaned up only later) that runs
into complexity issues in SCEV analysis and elsewhere.  For the
non-single-element interleaving case we already put a hard limit
in place, this applies the same limit to the missing case.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-01-11  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/91403
        * tree-vect-data-refs.c (vect_analyze_group_access_1): Cap
        single-element interleaving group size at 4096 elements.

        * gcc.dg/vect/pr91403.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr91403.c | 11 +++++++++++
 gcc/tree-vect-data-refs.c           |  6 +++++-
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr91403.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr91403.c 
b/gcc/testsuite/gcc.dg/vect/pr91403.c
new file mode 100644
index 00000000000..5b9b76060ab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr91403.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+extern int a[][1000000];
+int b;
+void c()
+{
+  for (int d = 2; d <= 9; d++)
+    for (int e = 32; e <= 41; e++)
+      b += a[d][5];
+}
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index c71ff7378d2..97c8577ebe7 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -2538,7 +2538,11 @@ vect_analyze_group_access_1 (vec_info *vinfo, 
dr_vec_info *dr_info)
         size.  */
       if (DR_IS_READ (dr)
          && (dr_step % type_size) == 0
-         && groupsize > 0)
+         && groupsize > 0
+         /* This could be UINT_MAX but as we are generating code in a very
+            inefficient way we have to cap earlier.
+            See PR91403 for example.  */
+         && groupsize <= 4096)
        {
          DR_GROUP_FIRST_ELEMENT (stmt_info) = stmt_info;
          DR_GROUP_SIZE (stmt_info) = groupsize;
-- 
2.26.2

Reply via email to