[Patch 0,1a] Improving effectiveness and generality of autovectorization using unified representation.

Sameera Deshpande Tue, 07 Jun 2016 07:00:38 -0700

Hi Richard,

This is with reference to our discussion at GNU Tools Cauldron 2015 regarding 
my talk titled "Improving the effectiveness and generality of GCC 
auto-vectorization." Further to our prototype implementation of the concept, we 
have started implementing this concept in GCC.


We are following incremental model to add language support in our front-end, 
and corresponding back-end (for auto-vectorizer) will be added for feature 
completion.

Looking at the complexity and scale of the project, we have divided this 
project into subtasks listed below, for ease of implementation, testing and 
review.

0. Add new pass to perform autovectorization using unified representation - 
Current GCC framework does not give complete overview of the loop to be 
vectorized : it either breaks the loop across body, or across iterations. 
Because of which these data structures can not be reused for our approach which 
gathers all the information of loop body at one place using primitive permute 
operations. Hence, define new data structures and populate them.

1. Add support for vectorization of LOAD/STORE instructions
    a. Create permute order tree for the loop with LOAD and STORE instructions 
for single or multi-dimensional arrays, aggregates within nested loops.
    b. Basic transformation phase to generate vectorized code for the primitive 
reorder tree generated at stage 1a using tree tiling algorithm. This phase 
handles code generation for SCATTER, GATHER, stridded memory accesses etc. 
along with permute instruction generation.

2. Implementation of k-arity promotion/reduction : The permute nodes within 
primitive reorder tree generated from input program can have any arity. 
However, the target can support maximum of arity = 2 in most of the cases. 
Hence, we need to promote or reduce the arity of permute order tree to enable 
successful tree tiling.

3. Vector size reduction : Depending upon the vector size for target, reduce 
vector size per statement and adjust the loop count for vectorized loop 
accordingly.

4. Support simple arithmetic operations :
    a. Add support for analyzing statements with simple arithmetic operations 
like +, -, *, / for vectorization, and create primitive reorder tree with 
compute_op.
    b. Generate vector code for primitive reorder tree generated at stage 4a 
using tree tiling algorithm - here support for complex patterns like 
multiply-add should be checked and appropriate instruction to be generated.

5. Support reduction operation :
    a. Add support for reduction operation analysis and primitive reorder tree 
generation. The reduction operation needs special handling, as the finish 
statement should COLLAPSE the temporary reduction vector TEMP_VAR into original 
reduction variable.
    b. The code generation for primitive reorder tree does not need any 
handling - as reduction tree is same as tree generated in 4a, with only 
difference that in 4a, the destination is MEMREF (because of STORE operation) 
and for reduction it is TEMP_VAR. At this stage, generate code for COLLAPSE 
node in finish statements.

6. Support other vectorizable statements like complex arithmetic operations, 
bitwise operations, type conversions etc.
    a. Add support for analysis and primitive reorder tree generation.
    b. Vector code generation.

7. Cost effective tree tiling algorithm : Till now, the tree tiling is 
happening without considering cost of computation. However, there can be 
multiple target instructions covering the tree - hence, instead of picking 
first matched largest instruction cover, select the instruction cover based on 
cost of instruction given in .md for the target.

8. Optimizations on created primitive reorder tree : This stage is open ended, 
and depending upon perf analysis, the scope of it can be defined.

The current patch I have attached herewith handles stage 0 and 1a : Adds new 
pass to perform autovectorization using unified representation, defines new 
data structures to cater to this requirement and creates primitive reorder tree 
for LOAD/STORE instructions within the loop.

The whole loop is represented using the ITER_NODE, which have information about
- The preparatory statements for vectorization to be executed before entering 
the loop (like initialization of vectors, prepping for reduction operations, 
peeling etc.)
- Vectorizable loop body represented as PRIMOP_TREE (primitive reordering tree)
- Final statements (For peeling, variable loop bound, COLLAPSE operation for 
reduction etc.)
- Other loop attributes (loop bound, peeling needed, dependences, etc.)

Memory accesses within a loop have definite repetitive pattern which can be 
captured using primitive permute operators which can be used to  determine 
desired permute order for the vector computations.  The PRIMOP_TREE is AST 
which records all computations and permutations required to store  destination 
vector into continuous memory at the end of all iterations of the  loop. It can 
have INTERLEAVE, CONCAT, EXTRACT, SPLIT, ITER or any compute operation as 
intermediate node. Leaf nodes can either be memory reference, constant or 
vector of loop invariants. Depending upon the operation, PRIMOP_TREE holds 
appropriate information about the statement within the loop which is necessary 
for vectorization.

At this stage, these data structures are populated by gathering all the 
information of the loop, statements within the loop and correlation of the 
statements within the loop. Moreover the loop body is analyzed to check if 
vectorization of each statement is possible. One has to note however that this 
analysis phase will give worst-case estimate of instruction selection, as it 
checks if specific named pattern is defined in .md for the target. It not 
necessarily  give optimal cover which is aim of the transformation phase using 
tree tiling algorithm - and can be invoked only once the loop body is 
represented using primitive reoder tree.

At this stage, the focus is to create permute order tree for the loop with LOAD 
and STORE instructions only. The code we intend to compile is of the form
FOR(i = 0; i < N; i + +)
{
  stmt 1 : D[k ∗ i + d 1 ] =S 1 [k ∗ i + c 11 ]
  stmt 2 : D[k ∗ i + d 2 ] =S 1 [k ∗ i + c 21 ]
  ...
  stmt k : D[k ∗ i + d k ] =S 1 [k ∗ i + c k 1 ]
}
Here we are assuming that any data reference can be represented using base + k 
* index + offset (The data structure struct data_reference from GCC is used 
currently for this purpose). If not, the address is normalized to convert to 
such representation.

We are looking forward to your suggestions and insight in this regard for 
better execution of this project.

Also, as this is long term project, can we create a branch in GCC to put all 
our patches at one place so that anyone interested can download and tweak with 
the code?

- Thanks and regards,
  Sameera D.

Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 236862)
+++ gcc/Makefile.in	(working copy)
@@ -1522,6 +1522,7 @@
 	tree-vect-loop-manip.o \
 	tree-vect-slp.o \
 	tree-vectorizer.o \
+	tree-vect-unified.o \
 	tree-vrp.o \
 	tree.o \
 	valtrack.o \
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 236862)
+++ gcc/common.opt	(working copy)
@@ -2570,6 +2570,10 @@
 Common Report Var(flag_tree_loop_vectorize) Optimization
 Enable loop vectorization on trees.
 
+ftree-loop-vectorize-unified
+Common Report Var(flag_tree_loop_vectorize_unified) Optimization
+Enable loop vectorization using unified representation on trees.
+
 ftree-slp-vectorize
 Common Report Var(flag_tree_slp_vectorize) Optimization
 Enable basic block vectorization (SLP) on trees.
Index: gcc/hsa.c
===================================================================
--- gcc/hsa.c	(revision 236862)
+++ gcc/hsa.c	(working copy)
@@ -840,6 +840,7 @@
     fn_opts = optimization_default_node;
   fn_opts = copy_node (fn_opts);
   TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
+  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize_unified = false;
   TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
   DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
 
Index: gcc/passes.def
===================================================================
--- gcc/passes.def	(revision 236862)
+++ gcc/passes.def	(working copy)
@@ -278,6 +278,11 @@
 	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_ch_vect);
 	  NEXT_PASS (pass_if_conversion);
+	  NEXT_PASS (pass_unified_vectorize);
+          PUSH_INSERT_PASSES_WITHIN (pass_unified_vectorize)
+	      NEXT_PASS (pass_dce);
+          POP_INSERT_PASSES ()
+
 	  /* pass_vectorize must immediately follow pass_if_conversion.
 	     Please do not add any other passes in between.  */
 	  NEXT_PASS (pass_vectorize);
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	(revision 236862)
+++ gcc/tree-data-ref.c	(working copy)
@@ -749,16 +749,19 @@
   return build_fold_addr_expr (TREE_OPERAND (addr, 0));
 }
 
-/* Analyzes the behavior of the memory reference DR in the innermost loop or
-   basic block that contains it.  Returns true if analysis succeed or false
-   otherwise.  */
-
+/* Analyze behavior of memory reference in the innermost loop or basic block
+   that contains it.  STMT is statement having memory reference.  REF is the
+   memory reference under consideration.  Returns true if analysis succeeds,
+   false otherwise.  PBASE, STEP and OFFSET are returned if analysis is
+   successful where REF is represented as PBASE + PSTEP * iter# + POFFSET.
+   The initial address of memory reference is returned in PINIT, and desired
+   alignment in ALIGNED_TO.  */
 bool
-dr_analyze_innermost (struct data_reference *dr, struct loop *nest)
+memref_analyze_innermost (gimple *stmt, tree ref, struct loop *nest,
+			  tree *pbase, tree *pstep, tree *poff, tree *pinit,
+			  tree *aligned_to)
 {
-  gimple *stmt = DR_STMT (dr);
   struct loop *loop = loop_containing_stmt (stmt);
-  tree ref = DR_REF (dr);
   HOST_WIDE_INT pbitsize, pbitpos;
   tree base, poffset;
   machine_mode pmode;
@@ -872,13 +875,13 @@
 		     fold_convert (ssizetype, base_iv.step),
 		     fold_convert (ssizetype, offset_iv.step));
 
-  DR_BASE_ADDRESS (dr) = canonicalize_base_object_address (base_iv.base);
+  *pbase = canonicalize_base_object_address (base_iv.base);
 
-  DR_OFFSET (dr) = fold_convert (ssizetype, offset_iv.base);
-  DR_INIT (dr) = init;
-  DR_STEP (dr) = step;
+  *poff = fold_convert (ssizetype, offset_iv.base);
+  *pinit = init;
+  *pstep = step;
 
-  DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base));
+  *aligned_to = size_int (highest_pow2_factor (offset_iv.base));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "success.\n");
@@ -886,6 +889,22 @@
   return true;
 }
 
+/* Analyzes the behavior of the memory reference DR in the innermost loop or
+   basic block that contains it.  Returns true if analysis succeed or false
+   otherwise.  */
+
+bool
+dr_analyze_innermost (struct data_reference *dr, struct loop *nest)
+{
+  bool ret;
+
+  ret = memref_analyze_innermost (DR_STMT (dr), DR_REF (dr), nest,
+		&(DR_BASE_ADDRESS (dr)), &(DR_STEP (dr)), &(DR_OFFSET (dr)),
+		&(DR_INIT (dr)), &(DR_ALIGNED_TO (dr)));
+
+  return ret;
+}
+
 /* Determines the base object and the list of indices of memory reference
    DR, analyzed in LOOP and instantiated in loop nest NEST.  */
 
Index: gcc/tree-data-ref.h
===================================================================
--- gcc/tree-data-ref.h	(revision 236862)
+++ gcc/tree-data-ref.h	(working copy)
@@ -341,7 +341,8 @@
 			    const struct data_reference *, bool);
 extern bool dr_equal_offsets_p (struct data_reference *,
                                 struct data_reference *);
-
+extern bool memref_analyze_innermost (gimple *, tree, struct loop *, tree *,
+				      tree *, tree *, tree *, tree *);
 /* Return true when the base objects of data references A and B are
    the same memory object.  */
 
Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 236862)
+++ gcc/tree-pass.h	(working copy)
@@ -375,6 +375,7 @@
 extern gimple_opt_pass *make_pass_graphite (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_graphite_transforms (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_if_conversion (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_unified_vectorize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_distribution (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_vectorize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_simduid_cleanup (gcc::context *ctxt);
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c	(revision 236862)
+++ gcc/tree-ssa-loop.c	(working copy)
@@ -398,7 +398,7 @@
   /* opt_pass methods: */
   virtual bool gate (function *fun)
     {
-      return flag_tree_loop_vectorize || fun->has_force_vectorize_loops;
+      return (flag_tree_loop_vectorize && !flag_tree_loop_vectorize_unified) || fun->has_force_vectorize_loops;
     }
 
   virtual unsigned int execute (function *);
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	(revision 236862)
+++ gcc/tree-vect-data-refs.c	(working copy)
@@ -136,16 +136,9 @@
   return scalar_type;
 }
 
-
-/* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be
-   tested at run-time.  Return TRUE if DDR was successfully inserted.
-   Return false if versioning is not supported.  */
-
-static bool
-vect_mark_for_runtime_alias_test (ddr_p ddr, loop_vec_info loop_vinfo)
+bool
+vect_mark_for_runtime_alias_test_1 (ddr_p ddr, loop *loop)
 {
-  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-
   if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 0)
     return false;
 
@@ -189,11 +182,28 @@
       return false;
     }
 
-  LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).safe_push (ddr);
   return true;
 }
 
 
+
+/* Insert DDR into LOOP_VINFO list of ddrs that may alias and need to be
+   tested at run-time.  Return TRUE if DDR was successfully inserted.
+   Return false if versioning is not supported.  */
+
+static bool
+vect_mark_for_runtime_alias_test (ddr_p ddr, loop_vec_info loop_vinfo)
+{
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  bool is_alias;
+
+  is_alias = vect_mark_for_runtime_alias_test_1 (ddr, loop);
+  if (is_alias)
+    LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).safe_push (ddr);
+  return is_alias;
+}
+
+
 /* Function vect_analyze_data_ref_dependence.
 
    Return TRUE if there (might) exist a dependence between a memory-reference
@@ -3181,16 +3191,35 @@
 vect_check_gather_scatter (gimple *stmt, loop_vec_info loop_vinfo, tree *basep,
 			   tree *offp, int *scalep)
 {
-  HOST_WIDE_INT scale = 1, pbitpos, pbitsize;
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+  tree base;
+  bool is_read;
+  tree vec_type;
+
+  base = DR_REF (dr);
+  is_read = DR_IS_READ (dr);
+  vec_type = STMT_VINFO_VECTYPE (stmt_info);
+
+  return vect_check_gather_scatter_1 (stmt, base, is_read, loop, vec_type,
+				       basep, offp, scalep);
+}
+
+/* Check whether a non-affine read or write in stmt is suitable for gather load
+   or scatter store and if so, return a builtin decl for that operation.  */
+
+tree
+vect_check_gather_scatter_1 (gimple *stmt, tree base, bool is_read,
+			     struct loop *loop, tree vec_type, tree *basep,
+			     tree *offp, int *scalep)
+{
+  HOST_WIDE_INT scale = 1, pbitpos, pbitsize;
   tree offtype = NULL_TREE;
-  tree decl, base, off;
+  tree decl, off;
   machine_mode pmode;
   int punsignedp, reversep, pvolatilep = 0;
 
-  base = DR_REF (dr);
   /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF,
      see if we can use the def stmt of the address.  */
   if (is_gimple_call (stmt)
@@ -3371,12 +3400,10 @@
   if (offtype == NULL_TREE)
     offtype = TREE_TYPE (off);
 
-  if (DR_IS_READ (dr))
-    decl = targetm.vectorize.builtin_gather (STMT_VINFO_VECTYPE (stmt_info),
-					     offtype, scale);
+  if (is_read)
+    decl = targetm.vectorize.builtin_gather (vec_type, offtype, scale);
   else
-    decl = targetm.vectorize.builtin_scatter (STMT_VINFO_VECTYPE (stmt_info),
-					      offtype, scale);
+    decl = targetm.vectorize.builtin_scatter (vec_type, offtype, scale);
 
   if (decl == NULL_TREE)
     return NULL_TREE;
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	(revision 236862)
+++ gcc/tree-vect-loop.c	(working copy)
@@ -709,7 +709,7 @@
    FORNOW: A simple evolution of an induction variables in the loop is
    considered a polynomial evolution.  */
 
-static bool
+bool
 vect_is_simple_iv_evolution (unsigned loop_nb, tree access_fn, tree * init,
                              tree * step)
 {
Index: gcc/tree-vect-unified.c
===================================================================
--- gcc/tree-vect-unified.c	(revision 0)
+++ gcc/tree-vect-unified.c	(working copy)
@@ -0,0 +1,2279 @@
+/* lOOP Vectorization using unified representation
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Loop autovectorization using unified representation for permute
+   instructions.  */
+#if 1
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "predict.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "cgraph.h"
+#include "fold-const.h"
+#include "stor-layout.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "tree-ssa-loop-manip.h"
+#include "tree-cfg.h"
+#include "cfgloop.h"
+#include "tree-vectorizer.h"
+#include "tree-ssa-propagate.h"
+#include "dbgcnt.h"
+#include "tree-scalar-evolution.h"
+#include "tree-vect-unified.h"
+#include "tree-pretty-print.h"
+#include "target.h"
+#include "rtl.h"
+#include "tm_p.h"
+#include "optabs-tree.h"
+#include "dumpfile.h"
+#include "alias.h"
+#include "tree-eh.h"
+#include "gimplify.h"
+#include "gimplify-me.h"
+#include "tree-ssa-loop-ivopts.h"
+#include "tree-ssa-loop.h"
+#include "expr.h"
+#include "builtins.h"
+#include "params.h"
+
+namespace {
+
+const pass_data pass_data_unified_vectorize =
+{
+  GIMPLE_PASS, /* type */
+  "unified-vect", /* name */
+  OPTGROUP_LOOP | OPTGROUP_VEC, /* optinfo_flags */
+  TV_TREE_VECTORIZATION, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_unified_vectorize : public gimple_opt_pass
+{
+public:
+  pass_unified_vectorize (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_unified_vectorize, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fun)
+    {
+      return (flag_tree_loop_vectorize && flag_tree_loop_vectorize_unified)
+	      || fun->has_force_vectorize_loops;
+    }
+
+  virtual unsigned int execute (function *);
+
+}; // class pass_unified_vectorize
+
+unsigned int
+pass_unified_vectorize::execute (function *fun)
+{
+  if (number_of_loops (fun) <= 1)
+    return 0;
+
+  return vectorize_loops_using_uniop ();
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_unified_vectorize (gcc::context *ctxt)
+{
+  return new pass_unified_vectorize (ctxt);
+}
+
+static struct ITER_node *
+new_iter_node (struct loop *loop)
+{
+  struct ITER_node *res;
+  basic_block *bbs;
+  gimple_stmt_iterator si;
+  int i;
+
+  res = (struct ITER_node *) xcalloc (1, sizeof (struct ITER_node));
+  ITER_NODE_LOOP (res) = loop;
+  ITER_NODE_NITERS (res) = NULL;
+  ITER_NODE_PROLOGUE (res) = vNULL;
+  ITER_NODE_LOOP_PEEL_NEEDED (res) = 0;
+  ITER_NODE_LOOP_BODY (res) = vNULL;
+  ITER_NODE_PEELING_FOR_GAPS (res) = false;
+  ITER_NODE_PEELING_FOR_NITER (res) = false;
+  ITER_NODE_EPILOGUE (res) = vNULL;
+
+  bbs = get_loop_body (loop);
+
+  for (i = 0; i < loop->num_nodes; i++)
+  {
+    basic_block bb = bbs[i];
+
+    for (si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next (&si))
+      {
+	gimple *phi = gsi_stmt (si);
+	gimple_set_uid (phi, 0);
+      }
+
+    for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+      {
+	gimple *stmt = gsi_stmt (si);
+	gimple_set_uid (stmt, 0);
+      }
+  }
+
+  return res;
+}// new_iter_node
+
+static void
+destroy_iter_node_info (struct ITER_node *inode)
+{
+  int i;
+  basic_block *bbs;
+  gimple_stmt_iterator si;
+
+  if (ITER_NODE_PROLOGUE (inode) != vNULL)
+    ITER_NODE_PROLOGUE (inode).release ();
+  if (ITER_NODE_LOOP_BODY (inode) != vNULL)
+    ITER_NODE_LOOP_BODY (inode).release ();
+  if (ITER_NODE_EPILOGUE (inode) != vNULL)
+    ITER_NODE_EPILOGUE (inode).release ();
+
+  bbs = get_loop_body (ITER_NODE_LOOP (inode));
+  for (i = 0; i < ITER_NODE_LOOP (inode)->num_nodes; i++)
+  {
+    basic_block bb = bbs[i];
+
+    for (si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next (&si))
+      {
+	gimple *phi = gsi_stmt (si);
+	gimple_set_uid (phi, 0);
+      }
+
+    for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+      {
+	gimple *stmt = gsi_stmt (si);
+	gimple_set_uid (stmt, 0);
+      }
+  }
+
+  free (inode);
+} // destroy_iter_node_info
+
+static struct stmt_attr *
+new_stmt_attr ()
+{
+  struct stmt_attr *info;
+  info = (struct stmt_attr *) xcalloc (1, sizeof (struct stmt_attr));
+  info->use_type = stmt_use_type_undef;
+  info->access_fn = NULL;
+  info->ptree = NULL;
+  info->dr = NULL;
+  info->probable_root = false;
+  info->vectype = NULL;
+  return info;
+} // new_stmt_attr
+
+static struct ITER_node *
+vect_populate_iter_node_from_loop (struct loop *loop)
+{
+  tree number_of_iterations, number_of_iterationsm1;
+  basic_block *bbs;
+  gcond *loop_cond, *inner_loop_cond = NULL;
+  int i;
+  gimple_stmt_iterator si;
+
+  if (! vect_analyze_loop_form_1 (loop, &loop_cond, &number_of_iterationsm1,
+	    &number_of_iterations, &inner_loop_cond))
+    return NULL;
+
+  struct ITER_node * t_iter_node = new_iter_node (loop);
+  ITER_NODE_NITERS (t_iter_node) = number_of_iterations;
+
+  bbs = get_loop_body (loop);
+  for (i = 0; i < loop->num_nodes; i++)
+  {
+    basic_block bb = bbs[i];
+
+    for (si = gsi_start_phis (bb); !gsi_end_p (si); gsi_next (&si))
+      {
+	gimple *phi = gsi_stmt (si);
+	set_stmt_attr (phi, new_stmt_attr ());
+      }
+
+    for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+      {
+	gimple *stmt = gsi_stmt (si);
+	set_stmt_attr (stmt, new_stmt_attr ());
+      }
+  }
+
+  if (!ITER_NODE_NITERS_KNOWN_P (t_iter_node))
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "Symbolic number of iterations is ");
+	  dump_generic_expr (MSG_NOTE, TDF_DETAILS, number_of_iterations);
+	  dump_printf (MSG_NOTE, "\n");
+	}
+    }
+
+  STMT_ATTR_USE_TYPE (loop_cond) = stmt_use_type_loop_exit_ctrl;
+  if (inner_loop_cond)
+     STMT_ATTR_USE_TYPE (inner_loop_cond) = stmt_use_type_loop_exit_ctrl;
+
+  gcc_assert (!loop->aux);
+  loop->aux = t_iter_node;
+  return t_iter_node;
+} // vect_populate_iter_node_from_loop
+
+static bool
+vect_analyze_dataref_access (struct data_reference *dr, struct loop * loop)
+{
+  tree step = DR_STEP (dr);
+  gimple *stmt = DR_STMT (dr);
+
+  if (!step)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "bad data-ref access in loop\n");
+      return false;
+    }
+
+  if (integer_zerop (step))
+    {
+      /* Stride is 0, i.e. the data_ref is loop invariant.  So, writes cannot be
+	 vectorized.  */
+      if (nested_in_vect_loop_p (loop, stmt))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "zero step in inner loop of nest\n");
+
+	  if (!loop->force_vectorize)
+	    return false;
+	}
+
+	return DR_IS_READ (dr);
+    }
+
+  if (loop && nested_in_vect_loop_p (loop, stmt))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "grouped access in outer loop.\n");
+      return false;
+    }
+
+  /* For now, do not vectorize for variable step size.  */
+  if (TREE_CODE (step) != INTEGER_CST)
+    return false;
+
+  return true;
+} // vect_analyze_dataref_access
+
+static bool
+vect_analyze_dataref_accesses (struct ITER_node *inode)
+{
+  unsigned int i;
+  vec<data_reference_p> datarefs = ITER_NODE_DATA_REFS (inode);
+  struct data_reference *dr;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "=== vect_analyze_dataref_accesses ===\n");
+
+  if (datarefs.is_empty ())
+    return true;
+
+  FOR_EACH_VEC_ELT (datarefs, i, dr)
+    if (!vect_analyze_dataref_access (dr, ITER_NODE_LOOP (inode)))
+      {
+	if (dump_enabled_p ())
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "not vectorized: complicated access pattern.\n");
+
+	return false;
+      }
+
+  return true;
+} // vect_analyze_dataref_accesses
+
+static bool
+vect_analyze_datarefs (struct ITER_node *inode)
+{
+  struct loop *loop = ITER_NODE_LOOP (inode);
+  vec<data_reference_p> datarefs = ITER_NODE_DATA_REFS (inode);
+  tree scalar_type;
+  tree vec_type;
+  struct data_reference *dr;
+  unsigned int i;
+
+  FOR_EACH_VEC_ELT (datarefs, i, dr)
+    {
+      gimple *stmt;
+      //tree base, offset, init;
+      bool simd_lane_access = false;
+      int vf;
+
+
+again:
+      if (!dr || !DR_REF (dr))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "not vectorized: unhandled data-ref\n");
+	  return false;
+	}
+
+      stmt = DR_STMT (dr);
+
+      /* Discard clobbers from the dataref vector.  We will remove
+	 clobber stmts during vectorization.  */
+      if (gimple_clobber_p (stmt))
+	{
+	  free_data_ref (dr);
+	  if (i == datarefs.length () - 1)
+	    {
+	      datarefs.pop ();
+	      break;
+	    }
+	  datarefs.ordered_remove (i);
+	  dr = datarefs[i];
+	  goto again;
+	}
+
+      /* Check that analysis of the data-ref succeeded.  */
+      if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr)
+	  || !DR_STEP (dr))
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: data ref analysis "
+			       "failed ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+
+	  return false;
+	}
+
+      if (TREE_CODE (DR_BASE_ADDRESS (dr)) == INTEGER_CST)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "not vectorized: base addr of dr is a "
+			     "constant\n");
+
+	  return false;
+	}
+
+      if (TREE_THIS_VOLATILE (DR_REF (dr)))
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: volatile type ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+
+	  return false;
+	}
+
+      if (stmt_can_throw_internal (stmt))
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: statement can throw an "
+			       "exception ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+
+	  return false;
+	}
+
+      if (TREE_CODE (DR_REF (dr)) == COMPONENT_REF
+	  && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (dr), 1)))
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: statement is bitfield "
+			       "access ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+
+	  return false;
+	}
+
+      //base = unshare_expr (DR_BASE_ADDRESS (dr));
+      //offset = unshare_expr (DR_OFFSET (dr));
+      //init = unshare_expr (DR_INIT (dr));
+
+      if (is_gimple_call (stmt)
+	  && (!gimple_call_internal_p (stmt)
+	      || (gimple_call_internal_fn (stmt) != IFN_MASK_LOAD
+		  && gimple_call_internal_fn (stmt) != IFN_MASK_STORE)))
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION,  vect_location,
+			       "not vectorized: dr in a call ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+
+	  return false;
+	}
+
+      /* If the dataref is in an inner-loop of the loop that is considered for
+	 for vectorization, we also want to analyze the access relative to
+	 the outer-loop (DR contains information only relative to the
+	 inner-most enclosing loop).  We do that by building a reference to the
+	 first location accessed by the inner-loop, and analyze it relative to
+	 the outer-loop.  */
+      if (loop && nested_in_vect_loop_p (loop, stmt))
+	{
+ 	  /* Do nothing for now, as the purpose is unclear.  */
+#if 0
+	  /* Build a reference to the first location accessed by the
+	     inner-loop: *(BASE+INIT).  (The first location is actually
+	     BASE+INIT+OFFSET, but we add OFFSET separately later).  */
+	  tree inner_base = build_fold_indirect_ref
+				(fold_build_pointer_plus (base, init));
+
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "analyze in outer-loop: ");
+	      dump_generic_expr (MSG_NOTE, TDF_SLIM, inner_base);
+	      dump_printf (MSG_NOTE, "\n");
+	    }
+
+	  outer_base = get_inner_reference (inner_base, &pbitsize, &pbitpos,
+					    &poffset, &pmode, &punsignedp,
+					    &preversep, &pvolatilep, false);
+	  gcc_assert (outer_base != NULL_TREE);
+
+	  if (pbitpos % BITS_PER_UNIT != 0)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "failed: bit offset alignment.\n");
+	      return false;
+	    }
+
+	  if (preversep)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "failed: reverse storage order.\n");
+	      return false;
+	    }
+
+	  outer_base = build_fold_addr_expr (outer_base);
+	  if (!simple_iv (loop, loop_containing_stmt (stmt), outer_base,
+			  &base_iv, false))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "failed: evolution of base is not affine.\n");
+	      return false;
+	    }
+
+	  if (offset)
+	    {
+	      if (poffset)
+		poffset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset,
+				       poffset);
+	      else
+		poffset = offset;
+	    }
+
+	  if (!poffset)
+	    {
+	      offset_iv.base = ssize_int (0);
+	      offset_iv.step = ssize_int (0);
+	    }
+	  else if (!simple_iv (loop, loop_containing_stmt (stmt), poffset,
+			       &offset_iv, false))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "evolution of offset is not affine.\n");
+	      return false;
+	    }
+
+	  outer_init = ssize_int (pbitpos / BITS_PER_UNIT);
+	  split_constant_offset (base_iv.base, &base_iv.base, &dinit);
+	  outer_init =  size_binop (PLUS_EXPR, outer_init, dinit);
+	  split_constant_offset (offset_iv.base, &offset_iv.base, &dinit);
+	  outer_init =  size_binop (PLUS_EXPR, outer_init, dinit);
+
+	  outer_step = size_binop (PLUS_EXPR,
+				fold_convert (ssizetype, base_iv.step),
+				fold_convert (ssizetype, offset_iv.step));
+#endif
+	}
+
+      STMT_ATTR_DR (stmt) = dr;
+
+      if (simd_lane_access)
+	{
+	  free_data_ref (datarefs[i]);
+	  datarefs[i] = dr;
+	}
+
+      /* Set vectype for STMT.  */
+      scalar_type = TREE_TYPE (DR_REF (dr));
+      vec_type = get_vectype_for_scalar_type (scalar_type);
+      if (!vec_type)
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: no vectype for stmt: ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, " scalar_type: ");
+	      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_DETAILS,
+				 scalar_type);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+
+	  return false;
+	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "got vectype for stmt: ");
+	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+	      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+				 vec_type);
+	      dump_printf (MSG_NOTE, "\n");
+	    }
+	}
+
+      /* Adjust the minimal vectorization factor according to the
+	 vector type.  */
+      vf = TYPE_VECTOR_SUBPARTS (vec_type);
+
+      if (TREE_CODE (DR_STEP (dr)) != INTEGER_CST)
+	{
+	  if (nested_in_vect_loop_p (loop, stmt))
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "not vectorized: not suitable for strided "
+				   "load ");
+		  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+		  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		}
+	      return false;
+	    }
+	}
+    }
+
+  return true;
+} // vect_analyze_datarefs
+
+static bool
+is_raw_dependence (struct data_reference *dra, struct data_reference *drb)
+{
+  gimple *earlier;
+  struct data_reference *earlier_ref;
+
+  /* Identify first reference, and check if read-after-write condition.  */
+  earlier = get_earlier_stmt (DR_STMT (dra), DR_STMT (drb));
+  if (earlier)
+    {
+      if (earlier == DR_STMT (dra))
+	earlier_ref = dra;
+      else
+	earlier_ref = drb;
+
+      if (DR_IS_WRITE (earlier_ref))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "READ_WRITE dependence in interleaving."
+			     "\n");
+	  return true;
+	}
+    }
+  return false;
+} // is_raw_dependence
+
+static bool
+vect_analyze_dataref_dependence (struct data_dependence_relation *ddr,
+				 struct ITER_node *inode)
+{
+  unsigned int i;
+  struct loop *loop = ITER_NODE_LOOP (inode);
+  struct data_reference *dra = DDR_A (ddr);
+  struct data_reference *drb = DDR_B (ddr);
+  lambda_vector dist_v;
+  unsigned int loop_depth;
+
+  /* Independent data accesses.  */
+  if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
+    return false;
+
+  if (dra == drb
+      || (DR_IS_READ (dra) && DR_IS_READ (drb)))
+    return false;
+
+  /* Even if we have an anti-dependence then, as the vectorized loop covers at
+     least two scalar iterations, there is always also a true dependence.
+     As the vectorizer does not re-order loads and stores we can ignore
+     the anti-dependence if TBAA can disambiguate both DRs similar to the
+     case with known negative distance anti-dependences (positive
+     distance anti-dependences would violate TBAA constraints).  */
+  if (((DR_IS_READ (dra) && DR_IS_WRITE (drb))
+       || (DR_IS_WRITE (dra) && DR_IS_READ (drb)))
+      && !alias_sets_conflict_p (get_alias_set (DR_REF (dra)),
+				 get_alias_set (DR_REF (drb))))
+    return false;
+
+  /* Unknown data dependence.  */
+  if (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know)
+    {
+      /* If user asserted safelen consecutive iterations can be
+	 executed concurrently, assume independence.  */
+      if (loop->safelen >= 2)
+	{
+	  return false;
+	}
+
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "versioning for alias required: "
+			   "can't determine dependence between ");
+	  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+			     DR_REF (dra));
+	  dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+	  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+			     DR_REF (drb));
+	  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	}
+
+      /* Add to list of ddrs that need to be tested at run-time.  */
+      return !vect_mark_for_runtime_alias_test_1 (ddr, loop);
+    }
+
+  /* Known data dependence.  */
+  if (DDR_NUM_DIST_VECTS (ddr) == 0)
+    {
+      /* If user asserted safelen consecutive iterations can be
+	 executed concurrently, assume independence.  */
+      if (loop->safelen >= 2)
+	{
+	  return false;
+	}
+
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "versioning for alias required: "
+			   "bad dist vector for ");
+	  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, DR_REF (dra));
+	  dump_printf (MSG_MISSED_OPTIMIZATION,  " and ");
+	  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, DR_REF (drb));
+	  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	}
+      /* Add to list of ddrs that need to be tested at run-time.  */
+      return !vect_mark_for_runtime_alias_test_1 (ddr, loop);
+    }
+
+  loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr));
+  FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v)
+    {
+      int dist = dist_v[loop_depth];
+
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "dependence distance  = %d.\n", dist);
+
+      if (dist == 0)
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "dependence distance == 0 between ");
+	      dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dra));
+	      dump_printf (MSG_NOTE, " and ");
+	      dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (drb));
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+
+	  if (is_raw_dependence (dra, drb))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "READ_WRITE dependence in interleaving."
+				 "\n");
+	      return true;
+	    }
+
+	  continue;
+	}
+
+      if (dist > 0 && DDR_REVERSED_P (ddr))
+	{
+	  /* If DDR_REVERSED_P the order of the data-refs in DDR was
+	     reversed (to make distance vector positive), and the actual
+	     distance is negative.  */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "dependence distance negative.\n");
+	  continue;
+	}
+
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+		       "not vectorized, possible dependence "
+		       "between data-refs ");
+	  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dra));
+	  dump_printf (MSG_NOTE,  " and ");
+	  dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (drb));
+	  dump_printf (MSG_NOTE,  "\n");
+	}
+
+      return true;
+    }
+
+  return false;
+} // vect_analyze_dataref_dependence
+
+static bool
+vect_analyze_dataref_dependences (struct ITER_node *inode,
+				  vec<loop_p> loop_nest)
+{
+  unsigned int i;
+  struct data_dependence_relation *ddr;
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "=== vect_analyze_data_ref_dependences ===\n");
+
+  ITER_NODE_DATA_DEPS (inode).create (ITER_NODE_DATA_REFS (inode).length ()
+	     * ITER_NODE_DATA_REFS (inode).length ());
+  if (!compute_all_dependences (ITER_NODE_DATA_REFS (inode),
+				&ITER_NODE_DATA_DEPS (inode),
+				loop_nest, true))
+    return false;
+
+  FOR_EACH_VEC_ELT (ITER_NODE_DATA_DEPS (inode), i, ddr)
+    if (vect_analyze_dataref_dependence (ddr, inode))
+      return false;
+
+  return true;
+} // vect_analyze_dataref_dependences
+
+
+
+
+static void
+mark_addr_index_stmt (tree use, struct loop *loop)
+{
+  return;
+}
+
+/*
+  The computations within the loop can be classified as :
+  - Induction : Following form where v1 is loop invarient.
+
+		var_phi = PHI (var_loop, var_inv)
+		:
+		var_loop = op (var_phi, v1)
+		
+  - Reduction : Following form where reduction variable var_loop or reduction
+		component var_phi has single use.  v1 can be any vectorizable
+		statement.
+
+		var_phi = PHI (var_loop, var_inv)
+		:
+		var_loop = op (var_phi, v1)
+		:
+		var_out_phi = PHI (var_loop)
+
+  - Intermediate : Intermediate computations defining ssa_vars which are not
+		   live beyond the loop.  The operands can be loop invarient/
+		   results of computations within the loop/PHI nodes/constant/
+		   memory.
+
+  - Loop invarient : Constant/memory references/ssa_vars with computations
+		     outside the loop.  There is no redefinition of components
+		     of computations within the loop.
+
+  - Scalar : The statements which contribute in address computations or other
+	     accesses which need not be vectorized.
+
+  - Complex : Not vectorizable.
+
+  FUNCTION classify_loop_stmt ():
+  - The loop statement with all operands on RHS as loop invariant (constants or
+    ssa_vars defined outside loop) are marked as loop invariant.  RO memory can
+    be marked as loop invariant.  Currently, RW memory is not marked as loop
+    invariant.
+
+  - The loop statement which is not vectorizable is marked as complex, and
+    vectorization is terminated.
+
+  - All other loop statements which have non-PHI nodes as output are marked as
+    intermediate.
+
+  - If any PHI node, which is neither reduction nor induction variable,
+    vectorization is terminated.
+  ENDFUNCTION
+*/
+static bool
+classify_loop_stmt (gimple *stmt, struct loop * loop)
+{
+  char *attr_name[] = {"Undefined", "Scalar", "loop_invariant", "induction",
+		       "reduction", "intermediate", "complex",
+		       "loop_exit_control"};
+  enum stmt_use_type stmt_type, def_type;
+  use_operand_p use_p;
+  ssa_op_iter iter;
+
+  if (gimple_has_volatile_ops (stmt))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "not vectorized: stmt has volatile operand\n");
+      return false;
+    }
+
+  if (STMT_ATTR_USE_TYPE (stmt) != stmt_use_type_undef
+      && STMT_ATTR_USE_TYPE (stmt) != stmt_use_type_scalar)
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location, " classify_loop_stmt: ");
+	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+ 	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "  Preprocessed : %s\n",
+			   attr_name[STMT_ATTR_USE_TYPE (stmt)]);
+	}
+      return (STMT_ATTR_USE_TYPE (stmt) != stmt_use_type_complex);
+    }
+
+  if (gimple_code (stmt) == GIMPLE_PHI)
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location, " classify_loop_stmt: ");
+	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "  PHI : %s\n",
+			   attr_name[STMT_ATTR_USE_TYPE (stmt)]);
+ 	}
+      return true;
+    }
+	
+  /* The statement lies outside the loop, so no need to nalyze the statement
+     further.  */
+  if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
+    {
+      STMT_ATTR_USE_TYPE (stmt) = stmt_use_type_loop_invariant;
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location, " classify_loop_stmt: ");
+	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+	  dump_printf_loc (MSG_NOTE, vect_location,
+		   "  Statement outside the loop under consideration.\n");
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "  : %s\n",
+			   attr_name[STMT_ATTR_USE_TYPE (stmt)]);
+	}
+      return true;
+    }
+
+  /* If DR in STMT, there is LOAD/STORE operation.  Mark the instructions
+     computing address for indexing as non-vectorizable/scalar.  */
+  if (STMT_ATTR_DR (stmt) && gimple_assign_single_p (stmt))
+    {
+      tree scalar_type;
+      tree op0 = gimple_assign_lhs (stmt);
+      tree op1 = gimple_assign_rhs1 (stmt);
+      enum tree_code code;
+
+      if (TREE_CODE (op0) == SSA_NAME)
+	{
+	  code = TREE_CODE (op1);
+	  if (code == ARRAY_REF
+      	      || code == BIT_FIELD_REF
+ 	      || code == INDIRECT_REF
+      	      || code == COMPONENT_REF
+      	      || code == IMAGPART_EXPR
+      	      || code == REALPART_EXPR
+      	      || code == MEM_REF
+	      || TREE_CODE_CLASS (code) == tcc_declaration)
+	    {
+	      /* LOAD - trace SSA_NAME for address if any.  Mark it as scalar if
+		 not marked already.  For loads, no need to analyze uses.  */
+	      mark_addr_index_stmt (TREE_OPERAND (op1, 0), loop);
+	      STMT_ATTR_USE_TYPE (stmt) = stmt_use_type_intermediate;
+	      scalar_type = TREE_TYPE (DR_REF (STMT_ATTR_DR (stmt)));
+	      STMT_ATTR_VECTYPE (stmt) =
+			 get_vectype_for_scalar_type (scalar_type);
+
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_NOTE, vect_location,
+				   " classify_loop_stmt: ");
+		  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+		  dump_printf_loc (MSG_NOTE, vect_location,
+				   " : %s\n",
+				   attr_name[STMT_ATTR_USE_TYPE (stmt)]);
+		}
+	      return true;
+	    }
+	  /* TODO: Conversion needs to be handled here.  */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "Expected load.\n");
+	  return false;
+	}
+      else
+	{
+	  code = TREE_CODE (op0);
+	  if (code == ARRAY_REF
+	      || code == BIT_FIELD_REF
+	      || code == INDIRECT_REF
+	      || code == COMPONENT_REF
+	      || code == IMAGPART_EXPR
+	      || code == REALPART_EXPR
+	      || code == MEM_REF
+	      || TREE_CODE_CLASS (code) == tcc_declaration)
+	    {
+	      /* STORE - Trace SSA_NAME for address if any.  Mark it as scalar
+		 if not marked already.  Do not return, as we need to analyze
+		 uses.  However, the statement itself is marked of type
+		 intermediate, so that it is not marked as loop invariant.  */
+	      mark_addr_index_stmt (TREE_OPERAND (op0, 0), loop);
+	      STMT_ATTR_USE_TYPE (stmt) = stmt_use_type_intermediate;
+	    }
+	  else
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "Expected store.\n");
+
+	      return false;
+	    }
+	}
+    }
+
+  /* TODO: PAREN_EXPR and CONVERT_EXPR_CODE_P need to be handled here?? - may
+     not.  But still, marking the location.  */
+  bool retval = true;
+  tree vec_type = NULL;
+  stmt_type = STMT_ATTR_USE_TYPE (stmt);
+  FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE)
+    {
+      tree op = USE_FROM_PTR (use_p);
+      enum tree_code code = TREE_CODE (op);
+
+      if (code == SSA_NAME)
+	{
+	  if (!SSA_NAME_IS_DEFAULT_DEF (op))
+	    {
+	      gimple *def_stmt = SSA_NAME_DEF_STMT (op);
+	      retval &= classify_loop_stmt (def_stmt, loop);
+
+	      def_type = STMT_ATTR_USE_TYPE (def_stmt);
+
+	      if (stmt_type > stmt_use_type_scalar && stmt_type != def_type)
+		stmt_type = stmt_use_type_intermediate;
+	      else
+		stmt_type = def_type;
+
+	      vec_type = STMT_ATTR_VECTYPE (def_stmt);
+ 	      if (STMT_ATTR_VECTYPE (stmt) && vec_type
+		  && !useless_type_conversion_p (vec_type,
+						 STMT_ATTR_VECTYPE (stmt)))
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			"Not vectorized: Types of operands do not match.\n");
+		  return false;
+		}
+	      STMT_ATTR_VECTYPE (stmt) = vec_type;
+	    }
+	}
+      else if (CONSTANT_CLASS_P (op) || is_gimple_min_invariant (op))
+	{
+	  if (stmt_type <= stmt_use_type_loop_invariant)
+	    stmt_type = stmt_use_type_loop_invariant;
+	  else
+	    stmt_type = stmt_use_type_intermediate;
+	}
+    }
+
+  /* Once all the USEs of stmt are marked, mark the type of this statement.  */
+  STMT_ATTR_USE_TYPE (stmt) = stmt_type;
+
+  if (stmt_type == stmt_use_type_undef)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+		 "Not vectorized: stmt use type UNDEF at end of processing.\n");
+      return false;
+    }
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, " classify_loop_stmt: ");
+      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "  : %s\n",
+		       attr_name[STMT_ATTR_USE_TYPE (stmt)]);
+    }
+
+  return retval;
+}
+
+/*
+  FUNCTION classify_loop_stmts () -
+  - Analyze PHI nodes in loop header to identify induction variables.  As
+    induction variables have initial definition and a definition within loop,
+    each induction variable aught to be in the PHI nodes of loop header.  Hence,
+    we need not identify them from loop body.
+
+  - If the induction variable has variable step, ILV node creation is not
+    possible (for now), as the arity depends upon the stepsize.  It can still be
+    reduction variable, hence decision of vectorization is not taken yet.
+
+  - The variables that are not induction variables, are checked if they are
+    reduction variables - the reduction operation is vectorizable only if the
+    reduction variable does not have any use apart from that in outgoing PHI
+    node.
+
+  - Invoke function classify_loop_stmt () to classify each statement in
+    PROBABLE_ROOTS list recursively so that all the USEs in the statement are
+    processed.
+  ENDFUNCTION
+*/
+
+static bool
+classify_loop_stmts (struct ITER_node *inode, vec<gimple *> *probable_roots)
+{
+  basic_block *bbs;
+  int nbbs;
+  struct loop *loop = ITER_NODE_LOOP (inode);
+  vec<gimple *> worklist = vNULL;
+
+  bbs = get_loop_body (loop);
+  nbbs = loop->num_nodes;
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "==== classify_loop_stmts ====\n");
+    }
+
+  /* Mark phi node*/
+  for (gphi_iterator si = gsi_start_phis (loop->header);
+       !gsi_end_p (si); gsi_next (&si))
+    {
+      gphi *phi = si.phi ();
+      tree access_fn = NULL;
+      tree def = PHI_RESULT (phi);
+      tree init, step;
+
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location, "Analyze phi: ");
+	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, phi, 0);
+	}
+
+      if (virtual_operand_p (def))
+	continue;
+
+      access_fn = analyze_scalar_evolution (loop, def);
+      if (access_fn)
+	{
+	  STMT_ATTR_ACCESS_FN (phi) = access_fn;
+	}
+      else
+	{
+	  worklist.safe_push (phi);
+	  continue;
+	}
+
+      if (vect_is_simple_iv_evolution (loop->num, access_fn, &init, &step))
+	{
+	  STMT_ATTR_USE_TYPE (phi) = stmt_use_type_induction;
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "Detected induction.\n");
+	      dump_printf (MSG_NOTE, "\n");
+	    }
+	}
+      else
+	worklist.safe_push (phi);
+    }
+#define REDUCTION_COMPLETE 0
+#if REDUCTION_COMPLETE
+  while (worklist.length () > 0)
+    {
+      gimple *phi = worklist.pop ();
+      tree def = PHI_RESULT (phi);
+
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "Analyze phi for reduction: ");
+	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, phi, 0);
+	}
+
+      gcc_assert (!virtual_operand_p (def)
+		  && STMT_ATTR_USE_TYPE (phi) == stmt_use_type_undef);
+
+      reduc_stmt = vect_simple_reduction ();
+
+      if (reduc_stmt)
+	{
+	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "Unknown def-use cycle pattern.\n");
+	      dump_printf (MSG_NOTE, "\n");
+	    }
+	}
+    }
+#else
+  if (worklist.length () > 0)
+    {
+      if (dump_enabled_p ())
+	{
+  	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+	    "Reduction not supported yet.  Unknown def-use cycle pattern.\n");
+	  dump_printf (MSG_NOTE, "\n");
+	}
+      return false;
+    }
+#endif
+
+  worklist = (*probable_roots).copy ();
+  while (worklist.length () > 0)
+    {
+      gimple *stmt = worklist.pop ();
+
+      if (!classify_loop_stmt (stmt, loop))
+	return false;
+    }
+
+  return true;
+}
+
+/*
+  The loop has side effects only if
+  - Loop writes to memory location.
+  - The computations within loop are live after the loop.
+
+  This function identifies such nodes.
+
+  FUNCTION mark_probable_root_nodes () -
+    FORALL statements within loop DO
+    - If the result of statement is in outgoing PHI node (live across loop body)
+      mark the statement for probable root p-node.
+
+    - If the result of statement is memory, mark the statement as probable root
+      p-node.
+
+    - If the loop is inner loop, and any statement uses induction variable of
+      outer loop - because of which stride > vector_size, do not vectorize.
+  ENDFUNCTION
+*/
+static void
+mark_probable_root_nodes (struct ITER_node *inode,
+			  vec<gimple *> *probable_roots)
+{
+  int i;
+  gimple *stmt;
+  ssa_op_iter op_iter;
+  imm_use_iterator imm_iter;
+  use_operand_p use_p;
+  def_operand_p def_p;
+  struct loop *loop = ITER_NODE_LOOP (inode);
+  int nbbs = loop->num_nodes;
+  basic_block *bbs = get_loop_body (ITER_NODE_LOOP (inode));
+
+  for (i = 0; i < nbbs; i++)
+    {
+      basic_block bb = bbs[i];
+      gimple_stmt_iterator si;
+
+      for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+	{
+	  stmt = gsi_stmt (si);
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+		 " mark_probable_root_nodes: ");
+	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+	    }
+
+	  if (gimple_has_volatile_ops (stmt))
+	    continue;
+
+	  if (gimple_vdef (stmt) && !gimple_clobber_p (stmt))
+	    {
+       	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+      				 "  : Memory def.\n");
+	      probable_roots->safe_push (stmt);
+	      STMT_ATTR_PROOT (stmt) = true;
+	    }
+
+
+	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_DEF)
+	    {
+	      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, DEF_FROM_PTR (def_p))
+		{
+		  basic_block bb = gimple_bb (USE_STMT (use_p));
+		  if (!flow_bb_inside_loop_p (loop, bb))
+		    {
+		      if (dump_enabled_p ())
+			dump_printf_loc (MSG_NOTE, vect_location,
+	  				 "  : Live beyond loop.\n");
+
+		      if (is_gimple_debug (USE_STMT (use_p)))
+	    		continue;
+
+			/* We expect all such uses to be in the loop exit phis
+	  		   (because of loop closed form)   */
+		      gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI);
+	  	      gcc_assert (bb == single_exit (loop)->dest);
+
+		      probable_roots->safe_push (stmt);
+		      STMT_ATTR_PROOT (stmt) = true;
+	    	    }
+		}
+    	    }
+	}
+    }
+}
+
+bool
+vect_is_simple_use (tree operand, gimple **def_stmt)
+{
+  *def_stmt = NULL;
+
+  if (CONSTANT_CLASS_P (operand))
+    return true;
+  if (is_gimple_min_invariant (operand))
+    return true;
+  if (TREE_CODE (operand) != SSA_NAME)
+    return false;
+  if (SSA_NAME_IS_DEFAULT_DEF (operand))
+    return true;
+
+  *def_stmt = SSA_NAME_DEF_STMT (operand);
+
+  switch (gimple_code (*def_stmt))
+    {
+    case GIMPLE_PHI:
+    case GIMPLE_ASSIGN:
+    case GIMPLE_CALL:
+      break;
+    default:
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "unsupported defining stmt:\n");
+      return false;
+    }
+  return true;
+}
+
+/* TODO: Replace this stub with actual folded address computations.  This is
+   only for testing purposes.  I know the memref node created with this is
+   incorrect.  */
+static tree
+normalize_base_addr (gimple *stmt, struct loop *loop, tree *offset)
+{
+  tree addr, var, constant, retval;
+#if 0
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "normalize_base_addr: ");
+      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+			 DR_BASE_ADDRESS (STMT_ATTR_DR (stmt)));
+      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+			 DR_INIT (STMT_ATTR_DR (stmt)));
+      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+			 DR_OFFSET (STMT_ATTR_DR (stmt)));
+      dump_printf (MSG_NOTE, "\n");
+    }
+#endif
+
+
+  addr = size_binop (PLUS_EXPR,
+	      fold_convert (ssizetype, DR_BASE_ADDRESS (STMT_ATTR_DR (stmt))),
+	      size_binop (PLUS_EXPR,
+	  	   fold_convert (ssizetype, DR_INIT (STMT_ATTR_DR (stmt))),
+		   fold_convert (ssizetype, DR_OFFSET (STMT_ATTR_DR (stmt)))));
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "normalize_base_addr: ");
+      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+			 addr);
+      dump_printf (MSG_NOTE, "\n");
+    }
+
+  split_constant_offset (addr, &var, &constant);
+
+  if (tree_fits_shwi_p (constant)
+      && tree_fits_shwi_p (DR_STEP (STMT_ATTR_DR (stmt))))
+    {
+      int c, step;
+      c = tree_to_shwi (constant);
+      step = tree_to_shwi (DR_STEP (STMT_ATTR_DR (stmt)));
+
+      if (c >= step)
+	{
+	  *offset = ssize_int (c % step);
+	  retval = size_binop (PLUS_EXPR, var, ssize_int (c / step * step));
+	}
+      else
+	{
+	  *offset = constant;
+	  retval = var;
+	}
+
+    }
+  else
+    {
+      *offset = ssize_int (0);
+      retval = size_binop (PLUS_EXPR, var, constant);
+    }
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "\tbase: ");
+      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+			 retval);
+      dump_printf (MSG_NOTE, "\toffset: ");
+      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+			 *offset);
+      dump_printf (MSG_NOTE, "\n");
+    }
+  return retval;
+}
+
+static struct mem_ref *
+vect_create_memref_ptr (gimple *stmt, struct loop *loop, bool is_read)
+{
+  struct mem_ref * memref;
+  tree offset;
+  int num =
+	 GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (STMT_ATTR_DR (stmt)))));
+  memref = (struct mem_ref *) xcalloc (1, sizeof (struct mem_ref));
+  MEM_REF_BASE (memref) = normalize_base_addr (stmt, loop, &offset);
+  MEM_REF_MULT_IDX (memref) = DR_STEP (STMT_ATTR_DR (stmt));
+  if (tree_fits_uhwi_p (MEM_REF_MULT_IDX (memref)))
+    MEM_REF_INT_MULT_IDX (memref) =
+	tree_to_uhwi (MEM_REF_MULT_IDX (memref))/num;
+  else
+    MEM_REF_INT_MULT_IDX (memref) = -1;
+  MEM_REF_OFFSET (memref) = offset;
+  if (tree_fits_uhwi_p (MEM_REF_OFFSET (memref)))
+    MEM_REF_INT_OFFSET (memref) = tree_to_uhwi (MEM_REF_OFFSET (memref))/num;
+  else
+    MEM_REF_INT_OFFSET (memref) = -1;
+  MEM_REF_IS_READ (memref) = is_read;
+
+  return memref;
+}
+
+
+/***** Helper functions for prim-tree creation *****/
+struct primop_tree *
+init_primop_node (void)
+{
+  struct primop_tree *ptree;
+  ptree = (struct primop_tree *) xcalloc (1, sizeof (struct primop_tree));
+
+  PT_NODE_OP (ptree) = 0;
+  PT_ARITY (ptree) = 0;
+  ptree->children = vNULL;
+  PT_PARENT (ptree) = NULL;
+  PT_ITER_COUNT (ptree) = NULL;
+  PT_VEC_SIZE (ptree) = 0;
+  PT_VEC_TYPE (ptree) = NULL;
+  PT_VEC_INST (ptree) = vNULL;
+  PT_TARGET_COST (ptree) = 0;
+  PT_NUM_INSTANCES (ptree) = 0;
+  PT_LOOP_DEPENDENCES (ptree) = vNULL;
+  PT_DEP (ptree) =vNULL;
+  PT_DEPTH (ptree) = 0;
+  memset (&ptree->u, 0, sizeof (ptree->u));
+  return ptree;
+}
+
+struct primop_tree *
+populate_prim_node (enum primop_code pcode, struct ITER_node *inode,
+		    struct primop_tree *parent, gimple *stmt)
+{
+  struct primop_tree *ptree;
+  ptree = init_primop_node ();
+
+  PT_NODE_OP (ptree) = (int) pcode;
+  PT_PARENT (ptree) = parent;
+  PT_ITER_COUNT (ptree) = ITER_NODE_NITERS (inode);
+
+  if (stmt)
+    PT_VEC_TYPE (ptree) = STMT_ATTR_VECTYPE (stmt);
+
+  return ptree;
+}
+
+/* Check if primop_tree node corresponding to memref already exists in
+   ITER_node.  If yes, return the pointer of primop_tree, else, return NULL.  */
+struct primop_tree *
+exists_primTree_with_memref (struct mem_ref *memref, struct ITER_node *inode)
+{
+  vec<struct primop_tree *> worklist;
+  worklist = (ITER_NODE_LOOP_BODY (inode)).copy ();
+
+  while (worklist.length () > 0)
+    {
+      primop_tree *ptree = worklist.pop ();
+      struct mem_ref *pmemref = PT_MEMVAL (ptree);
+
+      if (!operand_equal_p (MEM_REF_BASE (pmemref), MEM_REF_BASE (memref), 0))
+	continue;
+
+      if (!operand_equal_p (MEM_REF_MULT_IDX (pmemref),
+			    MEM_REF_MULT_IDX (memref), 0))
+	continue;
+
+      if (MEM_REF_IS_READ (pmemref) != MEM_REF_IS_READ (memref))
+   	continue;
+
+      if (dump_enabled_p ())
+	{
+      	  dump_printf_loc (MSG_NOTE, vect_location,
+	 		   " exists_primTree_with_memref: TRUE\n");
+	}
+
+      return ptree;
+    }
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+  	 	       " exists_primTree_with_memref: FALSE\n");
+    }
+
+  return NULL;
+}
+
+/* Create primtree of type mem_ref - it is only for load/store.  */
+struct primop_tree *
+create_primTree_memref (struct mem_ref *memref, struct ITER_node *inode,
+			struct primop_tree *parent)
+{
+  struct primop_tree * ptree;
+
+  ptree = populate_prim_node (POP_MEMREF, inode, parent, NULL);
+
+  PT_MEMVAL (ptree) = (struct mem_ref *) xcalloc (1, sizeof (struct mem_ref));
+  MEM_REF_BASE (PT_MEMVAL (ptree)) = unshare_expr (MEM_REF_BASE (memref));
+  MEM_REF_MULT_IDX (PT_MEMVAL (ptree)) =
+		 unshare_expr (MEM_REF_MULT_IDX (memref));
+  MEM_REF_INT_MULT_IDX (PT_MEMVAL (ptree)) = MEM_REF_INT_MULT_IDX (memref);
+  MEM_REF_INT_OFFSET (PT_MEMVAL (ptree)) = -1;
+  MEM_REF_OFFSET (PT_MEMVAL (ptree)) = NULL;
+  MEM_REF_IS_READ (PT_MEMVAL (ptree)) = MEM_REF_IS_READ (memref);
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       " create_primTree_memref : stride - %d\n ",
+		       MEM_REF_INT_MULT_IDX (memref));
+#if 0
+      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+			 MEM_REF_BASE (PT_MEMVAL (ptree)));
+      dump_printf (MSG_NOTE, "\n");
+      dump_generic_expr (MSG_NOTE, TDF_SLIM,
+			 MEM_REF_MULT_IDX (PT_MEMVAL (ptree)));
+      dump_printf (MSG_NOTE, "\n");
+#endif
+     }
+  return ptree;
+}
+
+/* Create primtree with PCODE as interleave or concat.  STMT is statement for
+   which primtree is being created.  */
+struct primop_tree *
+create_primTree_combine (enum primop_code pcode, gimple *stmt, int parts,
+			 struct ITER_node *inode, struct primop_tree *parent)
+{
+  struct primop_tree * ptree;
+
+  ptree = populate_prim_node (pcode, inode, parent, stmt);
+  PT_OPERAND_SELECTOR (ptree) = -1;
+  PT_DIVISION (ptree) = parts;
+  PT_VAR_STRIDE (ptree) = NULL;
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       " create_primTree_combine: parts - %d\n", parts);
+    }
+
+  return ptree;
+}
+
+/* Create primtree with PCODE as split or extract.  STMT is statement for which
+   primtree is being created.  PARTS is number of partitions to be created.
+   SELECTOR is the part being selected.  */
+struct primop_tree *
+create_primTree_partition (enum primop_code pcode, gimple *stmt, int parts,
+			   int selector, struct ITER_node *inode,
+			   struct primop_tree *parent)
+{
+  struct primop_tree * ptree;
+
+  ptree = populate_prim_node (pcode, inode, parent, stmt);
+  PT_OPERAND_SELECTOR (ptree) = selector;
+  PT_DIVISION (ptree) = parts;
+  PT_VAR_STRIDE (ptree) = NULL;
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		   " create_primTree_partition : parts - %d selector - %d\n",
+		   parts, selector);
+    }
+
+  return ptree;
+}
+
+/* Attach PCHILD node as idx^th child of PNODE.  */
+void
+add_child_at_index (struct primop_tree *ptree,
+		    struct primop_tree *pchild, int idx)
+{
+  while (idx >= ptree->children.length ())
+    {
+      ptree->children.safe_push (NULL);
+    }
+  PT_CHILD (ptree, idx) = pchild;
+}
+
+/* Get idx^th child of PNODE.  */
+struct primop_tree *
+get_child_at_index (struct primop_tree *ptree, int idx)
+{
+  return PT_CHILD (ptree, idx);
+}
+
+
+
+/*
+  This function checks if STMT is STORE instruction and is vectorizable for
+  target architecture.  If TRUE, it computes MEMREF node with respect to the
+  loop LOOP for which the vectorization is targetted.  It also sets the flag
+  IS_VAR_MULT_IDX to TRUE if the stride (multiplicative index) is variable.
+
+  TODO: Currently, we are not handling variable strides as vectorization is not
+  possible in all the cases with variable stride.
+
+  The variable step can be of 3 types:
+  1. Monotonically dependent on index
+  2. Non-monotonically dependent on index.
+  3. Loop invariant step.
+     a. Case for classic vectorization
+     b. Case for SLP
+
+  1. Monotonically dependent on index:
+  The address expression under consideration is base + step * index + offset.
+  However, even if the induction variable is linear in nature, if the step is
+  monotonically dependent on index - which is multiplied by induction variable,
+  the expression no longer remains linear, but becomes quadratic - because of
+  which loop unrolling cannot take place.
+
+  2. Non-monotonically dependent on index:
+  Again, variable step can cause RAW or WAW conflicts if it is not monotonic
+  function of index, because of which optimization will be incorrect.
+   eg:
+    step = a[i];
+    c[step * i + d] = b[i]
+
+  If it is on RHS, though there won't be conflicts, we cannot determine memory
+  locations which are accessed at compile-time, because of which optimization
+  not possible.
+
+  3. Loop invariant step:
+   a) Classic auto-vectorization for stride = j.
+    eg:
+     j = op (v1, v2)
+     for (i = 0; i < 2048; i++)
+       a[i] = b[i*j]
+   So, we should extract multiples of j from array b : for which instruction
+   generation is very difficult as we don't know what permute order to use.
+
+   If we have some instruction like (un)pack (vreg, reg) which (un)packs vector
+   elements from vec with index = multiples of reg to form new vec, we can think
+   of this optimization.
+
+   b) The only case in which variable step can work unconditionally is if the
+   variable is iteration invariant - for which SLP might work if SLP factor is
+   same as vector size.
+    eg:
+     For vector size = 4,
+     for loop body as follows:
+
+     a[4 * i] = b[var * i]
+     a[4 * i + 1] = b[var * i + 1]
+     a[4 * i + 2] = b[var * i  + 2]
+     a[4 * i + 3] = b[var * i + 3]
+
+   above can be vectorized with vector load at b[var * i].  Conditions: a and b
+   are distinct.
+
+   Looked at paper "Automatic Vectorization of Interleaved Data Revisited" for
+   implementation of this part, however, I personally don't find it useful for
+   MIPS.
+*/
+static struct primop_tree *
+vectorizable_store (gimple *stmt, struct ITER_node *inode,
+		    struct primop_tree *parent)
+{
+  tree src_op, dest_op;
+  enum tree_code code;
+  tree vec_type, scalar_type, rhs_vectype;
+  gimple *def_stmt;
+  machine_mode vec_mode;
+  struct loop *loop = ITER_NODE_LOOP (inode);
+  struct mem_ref *memref;
+  struct primop_tree *pnode, *pchild1, *pchild2;
+
+  if (STMT_ATTR_USE_TYPE (stmt) != stmt_use_type_intermediate)
+    return NULL;
+
+  if (!gimple_assign_single_p (stmt))
+    return NULL;
+
+  dest_op = gimple_assign_lhs (stmt);
+  code = TREE_CODE (dest_op);
+
+  if (code != ARRAY_REF
+      && code != BIT_FIELD_REF
+      && code != INDIRECT_REF
+      && code != COMPONENT_REF
+      && code != IMAGPART_EXPR
+      && code != REALPART_EXPR
+      && code != MEM_REF)
+    return NULL;
+
+  if (!STMT_ATTR_DR (stmt))
+    return NULL;
+
+  scalar_type = TREE_TYPE (DR_REF (STMT_ATTR_DR (stmt)));
+  vec_type = get_vectype_for_scalar_type (scalar_type);
+
+  src_op = gimple_assign_rhs1 (stmt);
+
+  if (!vect_is_simple_use (src_op, &def_stmt))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "use not simple.\n");
+      return NULL;
+    }
+
+  rhs_vectype = STMT_ATTR_VECTYPE (def_stmt);
+  if (rhs_vectype && !useless_type_conversion_p (vec_type, rhs_vectype))
+    return NULL;
+
+
+  vec_mode = TYPE_MODE (vec_type);
+  if (optab_handler (mov_optab, vec_mode) == CODE_FOR_nothing)
+    return NULL;
+
+  /* The dataref for store is available.  Analyze it, and create memref for the
+     same.  */
+
+  memref = vect_create_memref_ptr (stmt, loop, false);
+  if (!memref)
+    return NULL;
+
+  if (MEM_REF_INT_MULT_IDX (memref) == -1 || MEM_REF_INT_OFFSET (memref) == -1)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "Variable stride or offset.\n");
+      return NULL;
+    }
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+	  	       " vectorize_store: ptree creation\n ");
+    }
+
+
+  pnode = exists_primTree_with_memref (memref, inode);
+  if (pnode == NULL)
+    {
+      pnode = create_primTree_memref (memref, inode, NULL);
+      ITER_NODE_LOOP_BODY (inode).safe_push (pnode);
+      pchild1 =  create_primTree_combine (POP_ILV, stmt,
+			MEM_REF_INT_MULT_IDX (memref), inode, parent);
+      add_child_at_index (pnode, pchild1, 0);
+    }
+  else
+    {
+      pchild1 = get_child_at_index (pnode, 0);
+    }
+    if (def_stmt)
+      {
+	pchild2 = analyze_and_create_ptree (pchild1, def_stmt, inode);
+	if (pchild2 == NULL)
+	  return NULL;
+      }
+    else
+      {
+	/* RHS is not SSA name - it is either invariant or constant.  Create
+	   appropriate primtree node accordingly.  */
+	/* TODO - Create POP_CONST or POP_INV node - which will create vector
+	   using VEC_INIT at later stages.  */
+	return NULL;
+      }
+
+    if (MEM_REF_INT_OFFSET (memref) >= PT_DIVISION (pchild1))
+      {
+	if (dump_enabled_p ())
+ 	  {
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "Not vectorized: stride < offset\n");
+	  }
+	return NULL;
+      }
+
+    add_child_at_index (pchild1, pchild2, MEM_REF_INT_OFFSET (memref));
+
+    return pnode;
+}
+
+static struct primop_tree *
+vectorizable_load (gimple *stmt, struct ITER_node *inode,
+		   struct primop_tree *parent)
+{
+  tree src_op, dest_op;
+  enum tree_code code;
+  tree vec_type, scalar_type;
+  machine_mode vec_mode;
+  struct loop * loop = ITER_NODE_LOOP (inode);
+  struct mem_ref *memref;
+  struct primop_tree *pnode, *pchild1;
+
+
+  if (STMT_ATTR_USE_TYPE (stmt) != stmt_use_type_intermediate)
+    return NULL;
+
+  if (!gimple_assign_single_p (stmt))
+    return NULL;
+
+  dest_op = gimple_assign_lhs (stmt);
+  src_op = gimple_assign_rhs1 (stmt);
+  code = TREE_CODE (src_op);
+
+  if (code != ARRAY_REF
+      && code != BIT_FIELD_REF
+      && code != INDIRECT_REF
+      && code != COMPONENT_REF
+      && code != IMAGPART_EXPR
+      && code != REALPART_EXPR
+      && code != MEM_REF)
+    return NULL;
+
+  if (!STMT_ATTR_DR (stmt))
+    return NULL;
+
+
+  scalar_type = TREE_TYPE (DR_REF (STMT_ATTR_DR (stmt)));
+  vec_type = get_vectype_for_scalar_type (scalar_type);
+
+  vec_mode = TYPE_MODE (vec_type);
+  if (optab_handler (mov_optab, vec_mode) == CODE_FOR_nothing)
+    return NULL;
+
+  /* The dataref for load is available.  Analyze it, and create memref for the
+     same.  */
+
+  memref = vect_create_memref_ptr (stmt, loop, true);
+  if (!memref)
+    return NULL;
+
+  if (MEM_REF_INT_MULT_IDX (memref) == -1 || MEM_REF_INT_OFFSET (memref) == -1)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "Variable stride or offset.\n");
+      return NULL;
+    }
+
+  gcc_assert (parent);
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       " vectorize_load: ptree creation\n ");
+    }
+
+  pnode = create_primTree_partition (POP_EXTR, stmt,
+		MEM_REF_INT_MULT_IDX (memref),
+		MEM_REF_INT_OFFSET (memref), inode, parent);
+  pchild1 = exists_primTree_with_memref (memref, inode);
+  if (pchild1 == NULL)
+    {
+      pchild1 = create_primTree_memref (memref, inode, pnode);
+    }
+  add_child_at_index (pnode, pchild1, 0);
+  return pnode;
+}
+
+/*
+  FORALL probable root p-nodes DO
+    FUNCTION analyze_and_create_ptree ():
+    - If the result is storing in memory, try accessing the address - extract
+      <base, idx, mult_idx, offset> tuple from gimple chains : it has fixed
+      format as follows:
+
+	_1 = PHI (var_loop, var_inv) <== Name _1 is idx
+	_2 = _1 + <offset>
+	_3 = _2 * <mult_idx> * <scalar_type_size>
+	_4 = <base> + _3
+	*_4 = op (v1, v2, ...)
+
+      - If <offset> >= <mult_idx> : Cannot vectorize currently.  (Optimization
+	TODO: create new induction variable ind_var and adjust the offset to
+	<offset> % <mult_idx>, and ind_var = idx + <offset>/<mult_idx>).
+      - Else
+	1. Create root node with mem_ref <base,idx> (if not existing already.
+	   If exists, goto step 3.)
+	2. Attach child - ILV node with arity = <mult_idx> to root node.
+	3. Create a child to ILV at index <offset>:
+	   - analyze_and_create_ptree (RHS).
+
+    - If <op> is vectorizable, create c-node with arity = arity (op), and attach
+      the children:
+      - analyze_and_create_ptree (v1)
+      - analyze_and_create_ptree (v2)
+      -  :
+      -  :
+
+    - If use is accessing memory - try accessing address - create mem_ref node
+      as above.
+      1. Create EXTR node with arity = <mult_idx> and extr_idx = <offset>
+      2. Create a child node with mem_ref <base, idx>
+
+    - If result is outgoing PHI node, there should be single use node
+      - Access definition of the use node.
+      - If op on LHS of definition is collapsible, generate a
+	COLPS <op, PHI name> in epilogue.
+      - Create root node with memref<PHI name, loop ind_var>
+      - Create ITER node with single child -  analyze_and_create (<second_opd>)
+*/
+struct primop_tree *
+analyze_and_create_ptree (struct primop_tree *parent, gimple *stmt,
+			  struct ITER_node *inode)
+{
+  struct primop_tree *ptree;
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		 " analyze_and_create_ptree: ");
+      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
+    }
+
+  if ( (ptree = vectorizable_store (stmt, inode, parent)) == NULL
+      && (ptree = vectorizable_load (stmt, inode, parent)) == NULL
+      /* && (ptree = vectorizable_assign (stmt, inode, parent)) == NULL
+	 && (ptree = vectorizable_reduction (stmt, inode, parent)) == NULL
+	 && (ptree = vectorizable_arith (stmt, inode, parent)) == NULL */
+     )
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+		 "Not vectorized: Target does not support instruction.\n");
+    }
+  return ptree;
+}
+
+static bool
+create_ptree (struct ITER_node *inode)
+{
+  auto_vec<gimple *, 64> worklist;
+  bool is_ok;
+
+  mark_probable_root_nodes (inode, &worklist);
+
+  if (worklist.length () == 0)
+    {
+      dump_printf (MSG_MISSED_OPTIMIZATION,
+	"Not vectorizable: no probable root nodes.\n");
+      return false;
+    }
+
+  is_ok = classify_loop_stmts (inode, &worklist);
+  if (! is_ok)
+    {
+      dump_printf (MSG_MISSED_OPTIMIZATION,
+	"Not vectorizable: Classification failed.\n");
+      return false;
+    }
+
+  while (worklist.length () > 0)
+    {
+      is_ok = (analyze_and_create_ptree (NULL, worklist.pop (), inode) != NULL);
+      if (! is_ok)
+	{
+  	  dump_printf (MSG_MISSED_OPTIMIZATION,
+		"Not vectorized: p-tree creation failed.\n");
+	  return false;
+	}
+    }
+
+  return true;
+}
+
+
+static bool
+vect_analyze_loop_with_prim_tree_2 (struct ITER_node *inode)
+{
+  bool ok;
+  int max_vf = MAX_VECTORIZATION_FACTOR;
+  int min_vf = 2;
+  unsigned int n_stmts = 0;
+  auto_vec<loop_p, 64> loop_nest;
+
+  /* Find all data references in the loop (which correspond to vdefs/vuses)
+     and analyze their evolution in the loop.  */
+
+  struct loop *loop = ITER_NODE_LOOP (inode);
+  basic_block *bbs = get_loop_body (ITER_NODE_LOOP (inode));
+  if (!find_loop_nest (loop, &loop_nest))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "not vectorized: loop contains function calls"
+			 " or data references that cannot be analyzed\n");
+      return false;
+    }
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]);
+	 !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple *stmt = gsi_stmt (gsi);
+	if (is_gimple_debug (stmt))
+	  continue;
+	++n_stmts;
+	if (!find_data_references_in_stmt (loop, stmt,
+					   &ITER_NODE_DATA_REFS (inode)))
+	  {
+	    if (is_gimple_call (stmt) && loop->safelen)
+	      {
+		tree fndecl = gimple_call_fndecl (stmt), op;
+		if (fndecl != NULL_TREE)
+		  {
+		    cgraph_node *node = cgraph_node::get (fndecl);
+		    if (node != NULL && node->simd_clones != NULL)
+		      {
+			unsigned int j, n = gimple_call_num_args (stmt);
+			for (j = 0; j < n; j++)
+			  {
+			    op = gimple_call_arg (stmt, j);
+			    if (DECL_P (op)
+				|| (REFERENCE_CLASS_P (op)
+				    && get_base_address (op)))
+			      break;
+			  }
+			op = gimple_call_lhs (stmt);
+			// Ignore #pragma omp declare simd functions
+			//   if they don't have data references in the
+			//   call stmt itself.
+			if (j == n
+			    && !(op
+				 && (DECL_P (op)
+				     || (REFERENCE_CLASS_P (op)
+					 && get_base_address (op)))))
+			  continue;
+		      }
+		  }
+	      }
+	    if (dump_enabled_p ())
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: loop contains function "
+			       "calls or data references that cannot "
+			       "be analyzed\n");
+	    return false;
+	  }
+	}
+  if (!vect_analyze_datarefs (inode))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "bad data references.\n");
+      return false;
+    }
+
+  ok = vect_analyze_dataref_accesses (inode);
+  if (!ok)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "bad data access.\n");
+      return false;
+    }
+
+  ok = vect_analyze_dataref_dependences (inode, loop_nest);
+  if (!ok
+      || max_vf < min_vf)
+    {
+      if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "bad data dependence.\n");
+      return false;
+    }
+
+  return create_ptree (inode);
+}
+
+static struct ITER_node *
+vect_analyze_loop_with_prim_tree (struct loop *loop)
+{
+  struct ITER_node * t_iter_node;
+  unsigned int vector_sizes;
+
+  /* Autodetect first vector size we try.  */
+  current_vector_size = 0;
+  vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "===== analyze_loop_nest =====\n");
+
+  /*if (loop_outer (loop) && (struct ITER_node *) loop->aux)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "outer-loop already vectorized.\n");
+      return NULL;
+    }*/
+
+  while (1)
+    {
+      /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
+      t_iter_node = vect_populate_iter_node_from_loop (loop);
+      if (!t_iter_node)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "bad loop form.\n");
+	  return NULL;
+	}
+
+      if (vect_analyze_loop_with_prim_tree_2 (t_iter_node))
+	{
+	  return t_iter_node;
+	}
+
+      destroy_iter_node_info (t_iter_node);
+
+      vector_sizes &= ~current_vector_size;
+      if (vector_sizes == 0
+	  || current_vector_size == 0)
+	return NULL;
+
+      /* Try the next biggest vector size.  */
+      current_vector_size = 1 << floor_log2 (vector_sizes);
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "***** Re-trying analysis with "
+			 "vector size %d\n", current_vector_size);
+    }
+} // analyze_and_create_prim_tree
+
+void
+dump_iter_node (struct ITER_node *inode)
+{
+
+} // dump_iter_node
+
+
+unsigned
+vectorize_loops_using_uniop (void)
+{
+  unsigned int vect_loops_num;
+  struct loop *loop;
+  bool any_ifcvt_loops = false;
+  unsigned int num_vectorized_loops = 0;
+  unsigned int ret = 0;
+  unsigned int i;
+//  hash_table<simd_array_to_simduid> *simd_array_to_simduid_htab = NULL;
+  vect_loops_num = number_of_loops (cfun);
+
+  /* Bail out if there are no loops.  */
+  if (vect_loops_num <= 1)
+    return 0;
+
+  if (cfun->has_simduid_loops)
+    return 0;
+//    note_simd_array_uses (&simd_array_to_simduid_htab);
+
+  iter_node = NULL;
+
+  init_stmt_attr_vec ();
+
+  FOR_EACH_LOOP (loop, 0)
+    if (loop->dont_vectorize)
+      any_ifcvt_loops = true;
+    else if (loop->simduid)
+      continue;
+    else if ((flag_tree_loop_vectorize
+	      && optimize_loop_nest_for_speed_p (loop))
+	     || loop->force_vectorize)
+      {
+	/* Vectorization should be possible.  Let us find if all statements are
+	   vectorizable, and if yes, create p-tree.  */
+	vect_location = find_loop_location (loop);
+	if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOCATION
+	    && dump_enabled_p ())
+	  dump_printf (MSG_NOTE, "\nAnalyzing loop at %s:%d\n",
+		       LOCATION_FILE (vect_location),
+		       LOCATION_LINE (vect_location));
+
+	iter_node = vect_analyze_loop_with_prim_tree (loop);
+	loop->aux = iter_node;
+
+	if (iter_node == NULL)
+	  continue;
+
+	if (!dbg_cnt (vect_loop))
+	  {
+	    any_ifcvt_loops = true;
+	    break;
+	  }
+
+	if (dump_enabled_p ())
+	  {
+	    dump_printf (MSG_NOTE, "\nLoop is vectorizable.\n");
+	    dump_iter_node (iter_node);
+	  }
+
+	gimple *loop_vectorized_call = vect_loop_vectorized_call (loop);
+	/* If the loop is vectorized, set uid of stmts within scalar loop to
+	   0.  This change is needed if transform phase uses this loop info.  */
+	/*if (loop_vectorized_call)
+	  set_uid_loop_bbs (loop_vinfo, loop_vectorized_call);*/
+
+	// TODO: Insert call to transformation entry point.
+
+	num_vectorized_loops++;
+	/* Now that the loop has been vectorized, allow it to be unrolled
+	   etc.  */
+	loop->force_vectorize = false;
+	if (loop_vectorized_call)
+	  {
+	    fold_loop_vectorized_call (loop_vectorized_call, boolean_true_node);
+	    ret |= TODO_cleanup_cfg;
+	  }
+      }
+
+
+  vect_location = UNKNOWN_LOCATION;
+
+  statistics_counter_event (cfun, "Vectorized loops", num_vectorized_loops);
+  if (dump_enabled_p ()
+      || (num_vectorized_loops > 0 && dump_enabled_p ()))
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "vectorized %u loops in function.\n",
+		     num_vectorized_loops);
+
+
+  if (any_ifcvt_loops)
+    for (i = 1; i < vect_loops_num; i++)
+      {
+	loop = get_loop (cfun, i);
+	if (loop && loop->dont_vectorize)
+	  {
+	    gimple *g = vect_loop_vectorized_call (loop);
+	    if (g)
+	      {
+		fold_loop_vectorized_call (g, boolean_false_node);
+		ret |= TODO_cleanup_cfg;
+	      }
+	  }
+      }
+
+  for (i = 1; i < vect_loops_num; i++)
+    {
+      struct ITER_node *inode;
+
+      loop = get_loop (cfun, i);
+      if (!loop)
+	continue;
+      inode = (struct ITER_node *) loop->aux;
+      if (inode)
+	destroy_iter_node_info (inode);
+      loop->aux = NULL;
+    }
+
+  free_stmt_attr_vec ();
+
+  if (num_vectorized_loops > 0)
+    {
+      /* If we vectorized any loop only virtual SSA form needs to be updated.
+	 ???  Also while we try hard to update loop-closed SSA form we fail
+	 to properly do this in some corner-cases (see PR56286).  */
+      rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa_only_virtuals);
+      return TODO_cleanup_cfg;
+    }
+
+  return ret;
+} // vectorize_loops_using_uniop
+#endif
Index: gcc/tree-vect-unified.h
===================================================================
--- gcc/tree-vect-unified.h	(revision 0)
+++ gcc/tree-vect-unified.h	(working copy)
@@ -0,0 +1,293 @@
+/* Loop Vectorization using unified representation for permute instructions.
+   Copyright (C) 2003-2015 Free Software Foundation, Inc.
+   Contributed by Sameera Deshpande <sameera.deshpa...@imgtec.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+
+/* Prim-op ITER : ITER node has 3 main objectives in loop representation using
+   primitive-ops for reordering -
+   1) Enlist all p-trees representing live destination vectors written within
+      the loop.
+   2) Record preparatory operations, loop transformations and cleanup operations
+      needed to enable vectorization.
+   3) Record number of iterations once vector-size reduction is applied on
+      primitive trees.  */
+
+struct ITER_node {
+  /* Preamble */
+
+  /* Loop for which the ITER node is created.  */
+  struct loop *loop;
+
+  /* Number of iterations - At the beginning, it is 1.  After vector-size
+     reduction operation, it has appropriate value.  */
+  tree num_iter;
+
+  /* Preparatory statements to be emitted before vectorized loop body.  */
+  vec<gimple *> prep_stmts;
+
+  /* If loop peeling is needed - before/after depending upon this variable.  */
+  int loop_peel_needed;
+
+  /* Actual loop body */
+
+  /* This vector enlists the statements which are live beyond the loop.  Each
+     iteration performs all the operations in this vector in same order.  For
+     most of the cases, this vector holds perm-trees responsible for vector
+     updation.  */
+  vec<struct primop_tree *> stmts;
+
+  /* Epilogue */
+
+  /* When we have grouped data accesses with gaps, we may introduce invalid
+     memory accesses.  We peel the last iteration of the loop to prevent
+     this.  */
+  bool peeling_for_gaps;
+
+  /* When the number of iterations is not a multiple of the vector size
+     we need to peel off iterations at the end to form an epilogue loop.  */
+  bool peeling_for_niter;
+
+  /* Concluding operations to be performed after loop body - e.g: collapse op
+     on temporary vectors.  */
+  vec<gimple *> finish_stmts;
+
+  /* All data references within the loop.  */
+  vec<data_reference_p> datarefs;
+
+  /* All data dependences within the loop.  */
+  vec<ddr_p> ddrs;
+};
+
+#define ITER_NODE_NITERS(x) (x)->num_iter
+#define ITER_NODE_NITERS_KNOWN_P(x) \
+  (tree_fits_shwi_p ((x)->num_iter) && tree_to_shwi ((x)->num_iter) > 0)
+#define ITER_NODE_LOOP(x) (x)->loop
+#define ITER_NODE_PROLOGUE(x) (x)->prep_stmts
+#define ITER_NODE_LOOP_BODY(x) (x)->stmts
+#define ITER_NODE_EPILOGUE(x) (x)->finish_stmts
+#define ITER_NODE_LOOP_PEEL_NEEDED(x) (x)->loop_peel_needed
+#define ITER_NODE_PEELING_FOR_GAPS(x) (x)->peeling_for_gaps
+#define ITER_NODE_PEELING_FOR_NITER(x) (x)->peeling_for_niter
+#define ITER_NODE_DATA_REFS(x) (x)->datarefs
+#define ITER_NODE_DATA_DEPS(x) (x)->ddrs
+
+enum stmt_use_type {
+  stmt_use_type_undef,
+  stmt_use_type_scalar,
+  stmt_use_type_loop_invariant,
+  stmt_use_type_induction,
+  stmt_use_type_reduction,
+  stmt_use_type_intermediate,
+  stmt_use_type_complex,
+  stmt_use_type_loop_exit_ctrl
+};
+
+struct stmt_attr {
+  enum stmt_vec_info_type type;
+  enum stmt_use_type use_type;
+  tree access_fn;
+  struct primop_tree *ptree;
+  bool probable_root;
+  struct data_reference *dr;
+  tree vectype;
+};
+
+#define STMT_ATTR_USE_TYPE(s) (get_stmt_attr (s))->use_type
+#define STMT_ATTR_ACCESS_FN(s) (get_stmt_attr (s))->access_fn
+#define STMT_ATTR_TREE(s) (get_stmt_attr (s))->ptree
+#define STMT_ATTR_PROOT(s) (get_stmt_attr (s))->probable_root
+#define STMT_ATTR_DR(s) (get_stmt_attr (s))->dr
+#define STMT_ATTR_VECTYPE(s) (get_stmt_attr (s))->vectype
+
+vec<struct stmt_attr *> stmt_attr_vec;
+
+void
+init_stmt_attr_vec (void)
+{ 
+  gcc_assert (!stmt_attr_vec.exists ());
+  stmt_attr_vec.create (50);
+}
+
+void
+free_stmt_attr_vec (void)
+{ 
+  gcc_assert (stmt_attr_vec.exists ());
+  stmt_attr_vec.release ();
+}
+
+static inline void
+set_stmt_attr (gimple *stmt, struct stmt_attr *info)
+{ 
+  unsigned int uid = gimple_uid (stmt);
+  if (uid == 0)
+    { 
+      gcc_checking_assert (info);
+      uid = stmt_attr_vec.length () + 1;
+      gimple_set_uid (stmt, uid);
+      stmt_attr_vec.safe_push (info);
+    }
+  else
+    {
+      gcc_checking_assert (info == NULL);
+      stmt_attr_vec[uid - 1] = info;
+    }
+}
+
+static inline struct stmt_attr *
+get_stmt_attr (gimple *stmt)
+{ 
+  unsigned int uid = gimple_uid (stmt);
+  if (uid == 0)
+    return NULL;
+
+  return stmt_attr_vec[uid - 1];
+}
+
+/* Extract information about memory reference, which can be used to construct
+   p-tree for each memory reference.  Same mem-ref is used if all fields along
+   with is_read is same for different references.  */
+struct mem_ref {
+  /* Base address.  */
+  tree base;
+  /* Interleave factor.  */
+  tree mult_idx;
+  int int_mult_idx;
+  /* offset.  */
+  tree offset;
+  int int_offset;
+  /* Loop invarient base address.  It can be same as BASE in case of simple
+     aggregates or arrays, and can be loop invarient part of the address in
+     case of nested loops/multi-dimentional arrays/compound aggregates.  */
+//  tree base_inv;
+  /* If the memory is referenced for reading or writing.  */
+  bool is_read;
+};
+
+#define MEM_REF_BASE(x) (x)->base
+#define MEM_REF_BASE_INV(x) (x)->base_inv
+#define MEM_REF_MULT_IDX(x) (x)->mult_idx
+#define MEM_REF_INT_MULT_IDX(x) (x)->int_mult_idx
+#define MEM_REF_OFFSET(x) (x)->offset
+#define MEM_REF_INT_OFFSET(x) (x)->int_offset
+#define MEM_REF_IS_READ(x) (x)->is_read
+
+/* PRIMOP_TREE : Memory accesses within a loop have definite repetative pattern
+   which can be captured using primitive permute operators which can be used to
+   determine desired permute order for the vector computations.  The PRIMOP_TREE
+   is AST which records all computations and permutations required to store
+   destination vector into continuous memory at the end of all iterations of the
+   loop.  */
+struct primop_tree {
+  /* If the tree-node is iter, compute (c-node) or permute (p-node).  */
+//  int node_type;
+
+  /* Operation.  */
+  int node_op;
+
+  /* Arity of the operation.  */
+  int arity;
+
+  /* List of children to this node.  */
+  vec<struct primop_tree *> children;
+
+  /* Parent node.  */
+  struct primop_tree *parent;
+
+  /* Number of iterations - At the beginning, it is loop_count.  After vec-size
+     reduction operation, it is changed to vectorization factor for the
+     operation.  */
+  tree iter_count;
+
+  /* Actual vector size supported by target.  */
+  int vec_size;
+
+  /* The vector type which should be used to vectorize this node.  */
+  tree vec_type;
+
+  /* Set of vector instructions to represent this node.  */
+  vec<gimple *> vec_inst;
+
+  /* Instruction cost for vec_size.  */
+  int target_cost;
+
+  /* Number of instances of vec_inst needed for iter_count.  */
+  int instances;
+
+  /* If the tree is loop-varient, the loops on which this tree depends.  */
+  /* TODO: I am not very sure of we need all the ITERs or just innermost
+     affecting loop.  However, for now, having list of all loops from inner-
+     most to outer-most.  */
+  vec<struct ITER_node *> loop_dependences;
+
+  /* Dependence links if any to other statements.  */
+  vec<ddr_p> dependences;
+
+  /* Depth within sub-tree of same type.  */
+  int depth;
+
+  union {
+    /* In case of c-node, gimple statement cooresponding to c-op.  */
+    gimple * gimple_for_comp;
+
+    /* In case of permute-node, some permute-specific attributes.  */
+    struct perm_node {
+      /* The component to be selected for EXTRACT or SPLIT op.  */
+      int opd_selector;
+
+      /* Number of partitions for permute op.  In case of variable mult-idx,
+	 this gives ARITY for ILV and CONCAT as well.  For constant mult-idx,
+	 ARITY = DIVISION.  */
+      int division;
+      tree *var_stride;
+    } val;
+
+    /* ITER-node representing inner loop, if any.  */
+    struct ITER_node * inode;
+
+    /* mem_ref without offset information.  */
+    struct mem_ref * memval;
+  } u;
+};
+
+#define PT_NODE_OP(x) (x)->node_op
+#define PT_ARITY(x) (x)->arity
+#define PT_CHILD(x,i) (x)->children[i]
+#define PT_PARENT(x) (x)->parent
+#define PT_ITER_COUNT(x) (x)->iter_count
+#define PT_VEC_SIZE(x) (x)->vec_size
+#define PT_VEC_TYPE(x) (x)->vec_type
+#define PT_VEC_INST(x) (x)->vec_inst
+#define PT_TARGET_COST(x) (x)->target_cost
+#define PT_NUM_INSTANCES(x) (x)->instances
+#define PT_LOOP_DEPENDENCES(x) (x)->loop_dependences
+#define PT_DEP(x) (x)->dependences
+#define PT_DEPTH(x) (x)->depth
+#define PT_COMPUTE_STMT(x) (x)->u.gimple_for_comp
+#define PT_OPERAND_SELECTOR(x) (x)->u.val.opd_selector
+#define PT_DIVISION(x) (x)->u.val.division
+#define PT_VAR_STRIDE(x) (x)->u.val.var_stride
+#define PT_INODE(x) (x)->u.inode
+#define PT_MEMVAL(x) (x)->u.memval
+
+struct ITER_node *iter_node;
+extern unsigned int vectorize_loops_using_uniop (void);
+extern struct primop_tree * analyze_and_create_ptree (struct primop_tree *,
+		 gimple *, struct ITER_node *);
+
+enum primop_code {POP_ILV=MAX_TREE_CODES, POP_CONCAT, POP_EXTR, POP_SPLT, POP_COLLAPSE, POP_MEMREF, POP_CONST, POP_INV};
Index: gcc/tree-vectorizer.c
===================================================================
--- gcc/tree-vectorizer.c	(revision 236862)
+++ gcc/tree-vectorizer.c	(working copy)
@@ -395,7 +395,7 @@
 /* If LOOP has been versioned during ifcvt, return the internal call
    guarding it.  */
 
-static gimple *
+gimple *
 vect_loop_vectorized_call (struct loop *loop)
 {
   basic_block bb = loop_preheader_edge (loop)->src;
@@ -431,7 +431,7 @@
 /* Fold LOOP_VECTORIZED internal call G to VALUE and
    update any immediate uses of it's LHS.  */
 
-static void
+void
 fold_loop_vectorized_call (gimple *g, tree value)
 {
   tree lhs = gimple_call_lhs (g);
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	(revision 236862)
+++ gcc/tree-vectorizer.h	(working copy)
@@ -733,7 +733,7 @@
 
 /* Return the earlier statement between STMT1 and STMT2.  */
 
-static inline gimple *
+inline gimple *
 get_earlier_stmt (gimple *stmt1, gimple *stmt2)
 {
   unsigned int uid1, uid2;
@@ -1033,6 +1033,9 @@
 extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
 extern tree vect_check_gather_scatter (gimple *, loop_vec_info, tree *, tree *,
 				       int *);
+extern tree vect_check_gather_scatter_1 (gimple *, tree, bool, struct loop *,
+					 tree, tree *, tree *, int *);
+
 extern bool vect_analyze_data_refs (vec_info *, int *);
 extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree,
 				      tree *, gimple_stmt_iterator *,
@@ -1059,6 +1062,7 @@
 extern tree vect_create_addr_base_for_vector_ref (gimple *, gimple_seq *,
 						  tree, struct loop *,
 						  tree = NULL_TREE);
+extern bool vect_mark_for_runtime_alias_test_1 (ddr_p, loop *);
 
 /* In tree-vect-loop.c.  */
 /* FORNOW: Used in tree-parloops.c.  */
@@ -1070,6 +1074,8 @@
 /* Drive for loop transformation stage.  */
 extern void vect_transform_loop (loop_vec_info);
 extern loop_vec_info vect_analyze_loop_form (struct loop *);
+extern bool vect_analyze_loop_form_1 (struct loop *, gcond **, tree *, tree *,
+				      gcond **);
 extern bool vectorizable_live_operation (gimple *, gimple_stmt_iterator *,
 					 gimple **);
 extern bool vectorizable_reduction (gimple *, gimple_stmt_iterator *,
@@ -1081,6 +1087,7 @@
 					stmt_vector_for_cost *,
 					stmt_vector_for_cost *,
 					stmt_vector_for_cost *);
+extern bool vect_is_simple_iv_evolution (unsigned, tree, tree *, tree *);
 
 /* In tree-vect-slp.c.  */
 extern void vect_free_slp_instance (slp_instance);
@@ -1110,5 +1117,7 @@
 unsigned vectorize_loops (void);
 void vect_destroy_datarefs (vec_info *);
 bool vect_stmt_in_region_p (vec_info *, gimple *);
+extern void fold_loop_vectorized_call (gimple *, tree);
+extern gimple *vect_loop_vectorized_call (struct loop *);
 
 #endif  /* GCC_TREE_VECTORIZER_H  */

[Patch 0,1a] Improving effectiveness and generality of autovectorization using unified representation.

Reply via email to