These are long integers, right? Otherwise the array could almost 
fit into 4MB cache of Itanium or Woodcrest.

However even this would not be enough for Power5 which has a massive 
36 MB L3 cache. By the way, we have Power5 and PowerPC available if you want.

> I wrote a pthread based latency tester that will access N integers
> randomly per thread.  Each member of the array is accessed once.
> All the numbers below are for N=1,000,000 integers.  Every integer is
> loaded exactly once, randomly.
> 
> The first number is the latency per thread, so it increases  with memory
> contention.  The second number is the "effective" ns, where I divide
> the run time[1] of all threads and divide it by the integers  retreived.
> It should decrease with increased threads if the machine has the CPU
> and memory system parallelism to avoid contention.
> 
>                                1 thread         2 threads        4 threads
> Dual Opteron 275[2]           83.69ns/83.69ns  80ns/52.08ns     85ns/21.72ns 
> Quad opteron 846[3]          108.07/108.07ns  115ns/61.39ns    110ns/27.89ns
> Dual Woodcrest-2.66[2]       107.18/107.18ns  108ns/54.03ns    118ns/29.69ns
> Dual core amd64-2.2GHz[5]     89.45/89.45ns    89.45ns/44.72   145ns/52.76ns
> AMD64 3200[4]-2.0GHz          69.74ns/69.74ns  69ns/69.31ns    137ns/69.85ns
> Dual socket nacoma 3.4GHz[6] 130.45/130.45ns  133/66.72ns      230ns/67.72ns
> Dual core p4-3.0[6]          115.45/115.46ns  185ns/101.03ns   283ns/92.67ns
> Dual it2-1.4GHz[6]           200.47/200.47ns  203ns/101.92ns   362ns/101.57ns
> 
> I'm happy to say that Pathscale, Intel, GCC-3, and GCC-4 all share
> mostly identical performance.  Although, I had to be very careful with
> pathscale to avoid the benchmark routine from getting optimized away.
> 
> Anyone have a Rev F opteron handy?
> 
> [1] Where runtime = max(finishtimes)-min(starttimes)
> [2] Dual socket, dual core = 4 cores
> [3] Quad socket, single core = 4 cores
> [4] Single core/single socket = 1 core
> [5] Dual core/single socket = 2 cores
> [6] Dual socket, single core = 2 cores.
> 
> -- 
> Bill Broadley
> Computational Science and Engineering
> UC Davis
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to