On Thu, Apr 18, 2013 at 7:21 PM, Mark Hahn <h...@mcmaster.ca> wrote: > Only for benchmarking? We have done this for years on our production >> clusters (and SGI provides a tool this and more to clean up nodes). We >> have this in our epilogue so that we can clean out memory on our diskless >> nodes so there is nothing stale sitting around that can impact the next >> users job. >> > > understood, but how did you decide that was actually a good thing? > > Mark,
Because it stopped the random out of memory conditions that we were having. > if two jobs with similar file reference patterns run, for instance, > drop_caches will cause quite a bit of additional IO delay. > > For our workloads, this is a highly unlikely scenario because nodes are not shared and the workload is very diverse, so for the next job to have any connection to the previous job is negligible. Craig > I guess the rationale would also be much clearer for certain workloads, > such as big-data reduction jobs, where things like executables would have > to be re-fetched, but presumably much larger input data might never be > re-referenced by following jobs. it would have to be jobs that have a lot > of intra- but not inter-job readonly file re-reference, > and where clean-page scavenging is a noticable cost. > > I'm guessing this may have been a much bigger deal on strongly NUMA > machines of a certain era (high-memory ia64 SGI, older kernels). > > regards, mark. >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf