the memory access pattern.  The main reason is that the Opteron only has
eight 2-Mbyte TLB entries, compared to 512 4-Kbyte TLB entries (see

which seems great to me: up to 16 MB without a TLB miss vs only 2MB...

below).  So, an app that accesses lots of little regions of memory
scattered all over the place will probably be hurt by using large
pages.

I find that statement a bit misleading; consider a case where I'm
iterating through a 16M region, touching 1 word at 4k strides.
8x2M pages will be golden, whereas small pages would thrash badly.

Anybody know if recent Intel processors have the same issue?

I don't really see how it could be avoided...

obtained using cpuid instruction on an Opteron 146...

L1 2-Mbyte TLB:
   DTLB entries       = 8
   ITLB entries       = 8
   DTLB associativity = full
   ITLB associativity = full

L1 4-Kbyte TLB:
   DTLB entries       = 32
   ITLB entries       = 32
   DTLB associativity = full
   ITLB associativity = full

L2 4-Kbyte TLB:
   DTLB entries       = 512
   ITLB entries       = 512
   DTLB associativity = 4
   ITLB associativity = 4

the intel doc I looked at listed up to 128x4k and 64x2 or 4M pages.
it didn't seem to address core2, though, which probably has more than the pent-m.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to