All:
I am trying the place the following Analysis in the vectorizer of GCC that
helps in improving the vectorizer to a great extent
For the unit stride, zero stride and non stride accesses of memory that helps
in vectorizer.
For the Data Dependency graph, the topological sort is performed. The
topological sorted Data Dependence graph the time
Stamp for each node of the DDG is assigned based on the following Algorithm.
For each node in Topological sorted order in DDG
{
Timestamp = 0;
Timestamp(node) = Max(Timestamp, Timestamp of all predecessors) + 1;
}
Based on the above calculation of timestamp, the partition of DDG is formed.
Each partition of DDG is having the nodes with the same
Stamp. So nodes in each partition can be vectorized as they are independent
nodes in the DDG. To enable the vectorization, the accesses
based on contiguous access and non-Contagious access the sub partition is
formed. The memory address of all the operands of each node
in the partition formed above is sorted in increasing/decreasing order. Based
on the sorted increasing/decreasing order of the memory
address of each operands of each node in the partition the sub partition is
performed based on the unit stride access, zero stride access
and the accesses that require shuffling of operands through the vectorized
instruction.
The above analysis will help in performing Data Layout on the partitioned nodes
of the DDG and based on Sub partition formed above and
more vectorization opportunities is enabled for performing data Layout on non
contiguous accesses and the sub partition With the contiguous
access helps in vectorization.
Thoughts?
Thanks & Regards
Ajit