[Bug tree-optimization/50067] [4.7 Regression] Wrong code with -fpredictive-commoning

rguenth at gcc dot gnu.org Fri, 19 Aug 2011 04:18:04 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50067


Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |spop at gcc dot gnu.org

--- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-08-19 
11:16:41 UTC ---
I think the bug is that we have

#(Data Ref: 
#  bb: 3 
#  stmt: D.2736_10 = MEM[(int *)&a + 16B];
#  ref: MEM[(int *)&a + 16B];
#  base_object: a
#  Access function 0: 16B

thus, an access function with a byte offset.  I don't see how this couldn't
happen with the 4.5 code-base and the indirect-ref handling as well.

Thus, the code in dr_analyze_indices that pushes an access-function for

  if (nest
      && (INDIRECT_REF_P (aref)
          || TREE_CODE (aref) == MEM_REF))
    {

probably shouldn't do so if the base analysis

      op = TREE_OPERAND (aref, 0);
      access_fn = analyze_scalar_evolution (loop, op);
      access_fn = instantiate_scev (before_loop, loop, access_fn);

isn't a POLYNOMIAL_CHREC.  Because then we have no idea if the
steps / element sizes are compatible.

For example for

int a[256] = { 0, 0, 0, 0, 7, 9, 0, 0, };
int main()
{
  int i;
  for (i = 0; i < 256; ++i)
    {
      a[i] = (*((char(*)[8])&a[4]))[i];
    }
}

we create

Creating dr for MEM[(char[8] *)&a + 16B][i_12]
analyze_innermost: success.
        base_address: &a
        offset from base address: 0
        constant offset from base address: 16
        step: 1
        aligned to: 128
        base_object: MEM[(char[8] *)&a]
Creating dr for a[i_12]
analyze_innermost: success.
        base_address: &a
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: a
(compute_affine_dependence
  (stmt_a =
D.2727_4 = MEM[(char[8] *)&a + 16B][i_12];
)
  (stmt_b =
a[i_12] = D.2728_5;
)
)
(Data Dep:
#(Data Ref:
#  bb: 3
#  stmt: D.2727_4 = MEM[(char[8] *)&a + 16B][i_12];
#  ref: MEM[(char[8] *)&a + 16B][i_12];
#  base_object: MEM[(char[8] *)&a];
#  Access function 0: {0, +, 1}_1
#  Access function 1: 16B
#)
#(Data Ref:
#  bb: 3
#  stmt: a[i_12] = D.2728_5;
#  ref: a[i_12];
#  base_object: a
#  Access function 0: {0, +, 1}_1
#)
    (don't know)

with a mismatched number of access functions.

On the 4.5 branch we seem to do the same and we seem to rely on the
mismatch in the number of access functions to say "don't know" for
dependence analysis.  Testcase:

extern void abort (void);
int a[6] = { 0, 0, 0, 0, 7, 0 };

int
main ()
{
  int i;
  int (*p)[1] = &a[4];
  for (i = 0; i < 4; ++i)
    {
      a[i + 1] = a[i + 2] > i;
      (*p)[i] &= ~1;
    }
  if (a[4] != 0)
    abort ();
  return 0;
}

#(Data Ref:
#  stmt: D.2727_9 = (*p_3)[i_19];
#  ref: (*p_3)[i_19];
#  base_object: (*(int[1] *) &a)[0];
#  Access function 0: {0, +, 1}_1
#  Access function 1: 16B

so the new "issue" seems to be that with invariant DRs now allowed
we have a spurious match in access functions, because with a
non-indirect ref base we do not add a access function for the base
(which would be simply 0B).

Of course then still

(Data Dep:
#(Data Ref:
#  bb: 3
#  stmt: D.2727_4 = MEM[(char[8] *)&a + 16B][i_12];
#  ref: MEM[(char[8] *)&a + 16B][i_12];
#  base_object: a
#  Access function 0: {0, +, 1}_1
#  Access function 1: 16B
#)
#(Data Ref:
#  bb: 3
#  stmt: a[i_12] = D.2728_5;
#  ref: a[i_12];
#  base_object: a
#  Access function 0: {0, +, 1}_1
#  Access function 1: 0
#)
    (no dependence)

because the code assumes that access functions affect "independent"
index spaces, if you look at

int a[256] = { 0, 0, 0, 0, 7, 9, 0, 0, };
int main()
{
  int i;
  for (i = 0; i < 256; ++i)
    {
      a[i] = (*((char(*)[8])&a[4]))[i];
    }
}

so it seems that rather than adding a separate access function for
the base pointer object (which could be dependent on i as well!)
we shouldn't do that at all.

Sebastian - do you remember anything about this code?

I'd simply do

Index: tree-data-ref.c
===================================================================
--- tree-data-ref.c     (revision 177891)
+++ tree-data-ref.c     (working copy)
@@ -862,34 +862,6 @@ dr_analyze_indices (struct data_referenc
       aref = TREE_OPERAND (aref, 0);
     }

-  if (nest
-      && (INDIRECT_REF_P (aref)
-         || TREE_CODE (aref) == MEM_REF))
-    {
-      op = TREE_OPERAND (aref, 0);
-      access_fn = analyze_scalar_evolution (loop, op);
-      access_fn = instantiate_scev (before_loop, loop, access_fn);
-      base = initial_condition (access_fn);
-      split_constant_offset (base, &base, &off);
-      if (TREE_CODE (aref) == MEM_REF)
-       off = size_binop (PLUS_EXPR, off,
-                         fold_convert (ssizetype, TREE_OPERAND (aref, 1)));
-      access_fn = chrec_replace_initial_condition (access_fn,
-                       fold_convert (TREE_TYPE (base), off));
-
-      TREE_OPERAND (aref, 0) = base;
-      VEC_safe_push (tree, heap, access_fns, access_fn);
-    }
-
-  if (TREE_CODE (aref) == MEM_REF)
-    TREE_OPERAND (aref, 1)
-      = build_int_cst (TREE_TYPE (TREE_OPERAND (aref, 1)), 0);


as that seems what is conservatively correct.

Now that will loose any index analysis on pointers ... but with
some obfuscation we can arbitrarily mix array and pointer-based
code, so ...

Relying on SCEV or access function count mismatches looks incredibly
fragile ...

For different array-views on the same array we also get bogus dependences:

int a[256] = { 0, 0, 0, 0, 7, 9, 0, 0, };
int main()
{
  int i;
  for (i = 0; i < 128; ++i)
    {
      a[i] = (*((char(*)[128])&a[0]))[i+128];
    }
}

(Data Dep:
#(Data Ref:
#  bb: 3
#  stmt: D.2728_5 = MEM[(char[128] *)&a][D.2727_4];
#  ref: MEM[(char[128] *)&a][D.2727_4];
#  base_object: a
#  Access function 0: {128, +, 1}_1
#)
#(Data Ref:
#  bb: 3
#  stmt: a[i_13] = D.2729_6;
#  ref: a[i_13];
#  base_object: a
#  Access function 0: {0, +, 1}_1
#)
    (no dependence)

so what the access_fns assume is that if ever two base objects are
the same then the access functions, in order of appearance in the
vector, are compatible (iff there is the same number of access functions).

[Bug tree-optimization/50067] [4.7 Regression] Wrong code with -fpredictive-commoning

Reply via email to