Re: [Beowulf] emergent behavior - correlation of job end times

Christopher Samuel Tue, 24 Jul 2018 19:10:00 -0700

On 25/07/18 04:52, David Mathog wrote:

One possibility is that at the "leading" edge the first job that
reads a section of data will do so slowly, while later jobs will take
the same data out of cache.  That will lead to a "peloton" sort of
effect, where the leader is slowed and the followers accelerated.
iostat didn't show very much disk IO though.


I have to admit that was my first thought too. I also started to
speculate about power saving but I couldn't see a way there for
later jobs to catch up enough.

One fun thing would be to turn HT off and set the scheduler to
run 20 jobs at a time and see if it still happens then.

Perhaps running this step with "perf record" to try and capture
profile data and then look to see if you can spot differences
across all the runs?   Not sure if there are scripts to do that,
or how easy it would be to rig up (plus of course the extra I/O
of recording the traces will perturb the system).

A very interesting problem!

All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] emergent behavior - correlation of job end times

Reply via email to