separate file:
SUBROUTINE MAIN(L, M, N)
REAL U(L, M, N), V(L, M, N), T(L, M, N), Q(L, M, N)
....
etc.
I.e., the memory used (automatic arrays) is based on the stack (at
least, that's how most Fortran compilers would implement it.
yeesh. for what it's worth, I sat in on an OLS BOF today which
discussed this issue (using big pages in linux). from what I gather,
kernel authorities don't want to complicate the VM by mixing normal
and big pages (4K and 2M on ia32/x86_64). so big pages are allocated
via hugetlbfs today, and this mechanism will be converted to mmaping
a block device in the future. but in either case, this is not a
mechanism which readily integrates into malloc or fortran stack allocation.
it may be possible to implement a certain amount of transparency
(falling back from large to small pages and replacing contiguous ranges
of small pages with larger ones), perhaps purely at user-level.
it would be incredibly useful if the HPC community could come up with
some actual numbers on how important bigpages are. I'm ashamed to say
that I haven't made any effort to measure this on any of my clusters.
oprofile appears to make it pretty easy to capture real data on how
often TLB's are a problem.
have any of you tried to get a handle on this stuff?
thanks, mark hahn.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf