[Bug middle-end/38492] [graphite] segfaulting code when compiled with -fgraphite -fgraphite-identity

2008-12-18 Thread jsjodin at gcc dot gnu dot org


--- Comment #6 from jsjodin at gcc dot gnu dot org  2008-12-18 19:39 ---
> This still fails here:
> 
> gfortran  -v -O2 -g -ffree-form -fgraphite -fgraphite-identity -cpp -D__FFTSG
> PR38492.f90
> 

I looked into this failure. It fails because the number of iterations cannot be
computed (chrec_unknown) when the loop domain is translated, and is ignored by
scan_tree_for_params. In general it can be dangerous to throw away
chrec_unknown, because it may code that we must generate. 
  The patch will prevent the scop from being transformed if the number of
iterations cannot be analyzed, and it will no longer ignore chrec_unknown. I
will run some more tests to make sure it does not break anything else. The
patch is below:

Index: graphite.c
===
--- graphite.c  (revision 142764)
+++ graphite.c  (working copy)
@@ -2356,6 +2356,7 @@ graphite_loop_normal_form (loop_p loop)
   tree nit;
   gimple_seq stmts;
   edge exit = single_dom_exit (loop);
+  tree iv = NULL_TREE;

   if (!number_of_iterations_exit (loop, exit, &niter, false))
 gcc_unreachable ();
@@ -2369,7 +2370,14 @@ graphite_loop_normal_form (loop_p loop)
   if (nb_reductions_in_loop (loop) > 0)
 return NULL_TREE;

-  return canonicalize_loop_ivs (loop, NULL, nit);
+  iv = canonicalize_loop_ivs (loop, NULL, nit);
+
+  nit = number_of_latch_executions (loop);
+  nit = analyze_scalar_evolution (loop, nit);
+  if (nit == chrec_dont_know)
+return NULL_TREE;
+
+  return iv;
 }

 /* Record LOOP as occuring in SCOP.  Returns true when the operation
@@ -2602,7 +2610,7 @@ scan_tree_for_params (scop_p s, tree e, 
   int cst_col, param_col;

   if (e == chrec_dont_know)
-return;
+gcc_unreachable ();

   switch (TREE_CODE (e))
     {


-- 

jsjodin at gcc dot gnu dot org changed:

   What|Removed |Added

 CC|                    |jsjodin at gcc dot gnu dot
   |        |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38492



[Bug tree-optimization/33680] [4.3 Regression] ICE when compilling elbg.c from ffmpeg (vectorizer)

2007-10-11 Thread jsjodin at gcc dot gnu dot org


--- Comment #15 from jsjodin at gcc dot gnu dot org  2007-10-11 14:17 
---
(In reply to comment #14)
> BTW, without this patch 
> http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02122.html
> there is no ICE and the loop gets vectorized.
> 
> Ira
> 

It may be that the test to go through an SSA_NAME in split_constant_offset is
still not strict enough. I was having problem knowing what to check for. See
thread: http://gcc.gnu.org/ml/gcc-patches/2007-07/msg01294.html

--- trunk/gcc/tree-data-ref.c   (revision 126953)
+++ trunk/gcc/tree-data-ref.c   (working copy)
@@ -565,6 +565,27 @@ split_constant_offset (tree exp, tree *v
return;
   }

+case SSA_NAME:
+  {
+   tree def_stmt = SSA_NAME_DEF_STMT (exp);
+   if (TREE_CODE (def_stmt) == GIMPLE_MODIFY_STMT)
+ {
+   tree def_stmt_rhs = GIMPLE_STMT_OPERAND (def_stmt, 1);
+
+   if (!TREE_SIDE_EFFECTS (def_stmt_rhs) 
+   && EXPR_P (def_stmt_rhs)
+   && !REFERENCE_CLASS_P (def_stmt_rhs))
+ {
+   split_constant_offset (def_stmt_rhs, &var0, &off0);
+   var0 = fold_convert (type, var0);
+   *var = var0;
+   *off = off0;
+   return;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33680



[Bug tree-optimization/34043] New: Missed optimization causing extra loads and stores when using x86_64 builtin function together with aggregate types.

2007-11-09 Thread jsjodin at gcc dot gnu dot org
Compile the attached file test.cpp with the following flags:
g++ -DNDEBUG -m32 -msse2 -O2 -msse2 -DSHOW_BUG gcc_bug.cpp -S

In the generated code there are useless stores and loads of %xmm0 to
-40(%eps) and -56(%eps).  If the code is compiled without -DSHOW_BUG
it will generate a more optimal version without the extra memory
accesses. See the attached generated files test_bad.s and test_good.s
for the resulting code. This is an old problem existing in 4.1.x.


-- 
   Summary: Missed optimization causing extra loads and stores when
using x86_64 builtin function together with aggregate
types.
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jsjodin at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34043



[Bug tree-optimization/34043] Missed optimization causing extra loads and stores when using x86_64 builtin function together with aggregate types.

2007-11-09 Thread jsjodin at gcc dot gnu dot org


--- Comment #1 from jsjodin at gcc dot gnu dot org  2007-11-09 16:16 ---
Created an attachment (id=14516)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14516&action=view)
Source code to expose the missed optimization


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34043



[Bug tree-optimization/34043] Missed optimization causing extra loads and stores when using x86_64 builtin function together with aggregate types.

2007-11-09 Thread jsjodin at gcc dot gnu dot org


--- Comment #2 from jsjodin at gcc dot gnu dot org  2007-11-09 16:16 ---
Created an attachment (id=14517)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14517&action=view)
Assembly code for bad code.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34043



[Bug tree-optimization/34043] Missed optimization causing extra loads and stores when using x86_64 builtin function together with aggregate types.

2007-11-09 Thread jsjodin at gcc dot gnu dot org


--- Comment #3 from jsjodin at gcc dot gnu dot org  2007-11-09 16:17 ---
Created an attachment (id=14518)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=14518&action=view)
Assembly code for good code.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34043



[Bug tree-optimization/34043] Missed optimization causing extra loads and stores when using x86_64 builtin function together with aggregate types.

2007-11-13 Thread jsjodin at gcc dot gnu dot org


--- Comment #5 from jsjodin at gcc dot gnu dot org  2007-11-13 17:07 ---
(In reply to comment #4)
> Related to PR 33790 (and most likely fixed by it).  There is another issue 
> with
> that bug relating to not deleting the extra store.
> 

Indeed the extra load disappeared when with the patch. The store did not get
deleted as expected. I looked at the differences between the good and bad case. 
Compiling the good case has the following sequence before the fre pass:
Note: src and dst are unions

src.f = D.9650_45;
D.9630_31 = src.f;
D.9655_46 = __builtin_ia32_addps (D.9630_31, D.9630_31);
dst.f = D.9655_46;
D.9632_33 = dst.f;

After fre the temps have been propagated and replaced the uses of dst.f:

src.f = D.9650_45;  
D.9630_31 = D.9650_45;  
D.9655_46 = __builtin_ia32_addps (D.9630_31, D.9630_31);
dst.f = D.9655_46;  
D.9632_33 = D.9655_46;

The extra stores to src.f are eliminated in dce. 

The bad case has the following code before and after fre:
src.i = D.9651_44;  
D.9630_31 = src.f;  
D.9655_45 = __builtin_ia32_addps (D.9630_31, D.9630_31);
dst.f = D.9655_45;  
D.9632_33 = dst.i;   

Since the src.i and src.f are probably not considered to be the same the
propagation does not work. It might be possible to handle this case if one
consideres the size of data being written and read from unions.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34043



[Bug tree-optimization/34043] Missed optimization causing extra loads and stores when using x86_64 builtin function together with aggregate types.

2007-12-03 Thread jsjodin at gcc dot gnu dot org


--- Comment #6 from jsjodin at gcc dot gnu dot org  2007-12-03 15:41 ---
> Indeed the extra load disappeared when with the patch. The store did not get
> deleted as expected. I looked at the differences between the good and bad 
> case. 

After doing some more investigation an attempt to fix the problem was done in
the value numbering pass. By using the size and a unique type for the size to
compute the value number it not longer matters which union that is accessed,
only the number of bits that are read/written. This gives good code test.cpp,
however the problem with this approach is that in FRE the different types
result in a cast, but it should really be a reinterpretation of the bits
(different types, same bit pattern). The small example below gives: int: -1,
float: -1.00 which is wrong. The expected result is: int: -1, float: nan

The solution would be to make the reinterpretation possible somehow. 


Simple test:
---
#include 

union foo {
  int i;
  float f;
} bar;


int main()
{
  int x = 0x;
  float y = 0.0;
  bar.i = x;
  y = bar.f;
  printf("int: %d, float: %f\n", bar.i, bar.f); 

  return 0;
}
-




Index: tree-ssa-sccvn.c
===
--- tree-ssa-sccvn.c(revision 130099)
+++ tree-ssa-sccvn.c(working copy)
@@ -456,6 +456,33 @@ shared_vuses_from_stmt (tree stmt)
   return shared_lookup_vops;
 }

+
+static tree get_generic_type_node_from_size(int size)
+{
+  tree result = NULL;
+  switch(size)
+{
+case 8:
+  result = unsigned_intQI_type_node;
+  break;
+case 16:
+  result = unsigned_intHI_type_node;
+  break;
+case 32:
+  result = unsigned_intSI_type_node;
+  break;
+case 64:
+  result = unsigned_intDI_type_node;
+  break;
+case 128:
+  result = unsigned_intTI_type_node;
+  break;
+default:
+  break;
+}
+  return result;
+}
+
 /* Copy the operations present in load/store/call REF into RESULT, a vector of
vn_reference_op_s's.  */

@@ -521,7 +548,23 @@ copy_reference_ops_from_ref (tree ref, V
  break;
case COMPONENT_REF:
  /* Record field as operand.  */
- temp.op0 = TREE_OPERAND (ref, 1);
+ {
+   tree aggregate = TREE_OPERAND (ref, 0);
+   tree aggregate_type = TREE_TYPE(aggregate);
+   tree field_decl = TREE_OPERAND (ref, 1);
+   tree field_decl_type = TREE_TYPE(field_decl);
+   tree field_decl_type_size = TYPE_SIZE(field_decl_type);
+   if (TREE_CODE (aggregate_type) == UNION_TYPE
+   && TREE_CODE (field_decl_type_size) == INTEGER_CST)
+ {
+   temp.op0 = field_decl_type_size;
+   temp.type =
get_generic_type_node_from_size(TREE_INT_CST_LOW(field_decl_type_size)) != NULL 
+ ?
get_generic_type_node_from_size(TREE_INT_CST_LOW(field_decl_type_size)) 
+ : temp.type;
+ } else {
+   temp.op0 = TREE_OPERAND (ref, 1);
+ }
+ }
  break;
case ARRAY_RANGE_REF:
case ARRAY_REF:



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34043



[Bug tree-optimization/34043] Missed optimization causing extra loads and stores when using x86_64 builtin function together with aggregate types.

2007-12-12 Thread jsjodin at gcc dot gnu dot org


--- Comment #8 from jsjodin at gcc dot gnu dot org  2007-12-13 01:56 ---
(In reply to comment #7)
> >+static tree get_generic_type_node_from_size(int size)
> 
> This function needs lots of improvement because the size of the modes is based
> on UNIT_BIT_SIZE (I think that is the macro name) and really you can go from a
> size to a mode and then to a tree type.
> 
> Note this patch really fixes my bug (PR 32964) if PRE is extended to handle
> this too.
> 

Yes I rewrote the patch to always use the first member of a union that match a
specific size. When the expression was extracted (queried from VN function) a
VIEW_CONVERT_EXPR would be insterted if the expected type did not match the
stored expression type. This caused some problems (assertions) in the compiler
that I have not been able to track down yet, although it fixed both my test
cases and for constants it doesn't cause any problems since the expression gets
folded. To avoid introducing VIEW_CONVERT_EXPR it may be possible to return an
expression (from VN) that refers to a union member with a matching type
instead. It would mean that the value number would be the same, but the
returned expression might be different. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34043