[Bug c++/35117] Vectorization on power PC

eyal at geomage dot com Tue, 12 Feb 2008 03:29:20 -0800


------- Comment #32 from eyal at geomage dot com  2008-02-12 11:28 -------
(In reply to comment #31)
> > I would appriciate, however, a further explaination about this issue.
> The explanation has to deal with CPU architecture and is not related to
> compilers.  In case of cache miss the memory load and store take tens of cpu
> cycles instead of few cycles in case of cache hit.
> When we run:
> time ./mvec 400000 1 29720 1000
> The program perform 400000 iterations of outer loop and 29720 iterations in
> internal loop. The internal loop performs 3 load accesses and one store access
> per iteration. Starting from second iteration of outer loop, all  29720
> elements of arrays pSum, pSum1 and pVec1 will be placed into cache and from
> this point all accesses will be cache hits. (I assume that data cache is big
> enough to contain all 29720*3 elements).
> Lets look at the slow run:
> % time ./TestVec 92200 8 89720 1000
> Here the program perform (89720-8) iterations in internal loop, so in order to
> have cache hits most of the time we need the cache to be at least 89712*3 in
> size.  Lets consider what will happen if cache size is only half of required
> amount.  After completion of first iteration of the outer loop, the cache will
> be filled with second half of data from arrays.  At start of second iteration
> of outer loop, all elements from first half will be evicted from the cache as
> most caches use LRU policy to choose evicted elements.  Considering that 
> PPC970
> is out-of-order, multiple-issue architecture we can guess why CPU have enough
> time to perform arithmetic operations even in scalar manner without adding any
> overhead relatively to vectorized version of internal loop.



Thanks a lot for the detailed explaination Victor. I'll try to see if I can
break the real code to be more memory friendly.
Again thanks a lot guys.

eyal


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117

[Bug c++/35117] Vectorization on power PC

Reply via email to