Can someone explain me why 2MB can be faster than 16MB?
Of course 16MB without TLB miss is always better than 2MB, isn't it?
Especially considering the chips have a 2MB L2 cache and in my case are
accessing the cache in a shared manner.
Thanks,
Vincent
----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <beowulf@beowulf.org>
Sent: Tuesday, July 25, 2006 9:17 PM
Subject: Re: [Beowulf] Feedback on large pages in Linux
the memory access pattern. The main reason is that the Opteron only has
eight 2-Mbyte TLB entries, compared to 512 4-Kbyte TLB entries (see
which seems great to me: up to 16 MB without a TLB miss vs only 2MB...
below). So, an app that accesses lots of little regions of memory
scattered all over the place will probably be hurt by using large
pages.
I find that statement a bit misleading; consider a case where I'm
iterating through a 16M region, touching 1 word at 4k strides.
8x2M pages will be golden, whereas small pages would thrash badly.
Anybody know if recent Intel processors have the same issue?
I don't really see how it could be avoided...
obtained using cpuid instruction on an Opteron 146...
L1 2-Mbyte TLB:
DTLB entries = 8
ITLB entries = 8
DTLB associativity = full
ITLB associativity = full
L1 4-Kbyte TLB:
DTLB entries = 32
ITLB entries = 32
DTLB associativity = full
ITLB associativity = full
L2 4-Kbyte TLB:
DTLB entries = 512
ITLB entries = 512
DTLB associativity = 4
ITLB associativity = 4
the intel doc I looked at listed up to 128x4k and 64x2 or 4M pages.
it didn't seem to address core2, though, which probably has more than the
pent-m.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf