Hi all,

Thought some of you might find this interesting.

Using the WGS (aka CA aka Celera) genome assembler there is a step which runs a large number (in this instance, 47634) of overlap comparisons. There are N sequences (many millions, of three different types) and it makes many sequence ranges and compares them pairwise, like 100-200 vs. 1200-1300. There is a job scheduler that keeps 40 jobs going at all times. However, during a run jobs are independent, they do not communicate with the others or with the job controller.

The initial observation was that "top" showed a very nonrandom distribution of elapsed times. Large numbers of jobs (20 or 30) appeared to have correlated elapsed times. So the end times for the jobs were determined and these were stored in a histogram with 1 minute wide bins. When plotted it shows the job end times clumping up, and what could be beat frequencies. I did not run this through any sort of autocorrelation analysis but the patterns are easily seen by eye when plotted. See for instance the region around 6200-6400. The patterns evolve over time, possibly because of differences in the regions of data. (Note, a script was changed around minute 2738, so don't compare patterns before that with patterns after it.) The jobs were all running single threaded and they were pretty much nailed at 99.9% CPU usage except when they started up or shut down. Each wrote its output through a gzip process to a compressed file, and they all seemed to be writing more or less all the time. However the gzip processes used a negligible fraction of the CPU time.

That histogram data is in end_times_histo.txt.gz on the 6th or so post here:

   https://github.com/alekseyzimin/masurca/issues/45

The subrange data for the jobs is in ovlopt.gz.

So, the question is, what might be causing the correlation of the job run times?

The start times were also available and these do not indicate any induced "binning". That is, the controlling process isn't waiting for a long interval to pass and then starting a bunch of jobs all at once. Probably it is spinning on a wait() with 1 second sleep() [because it uses no CPU time] and starts the next job as soon as one exits.

One possibility is that at the "leading" edge the first job that reads a section of data will do so slowly, while later jobs will take the same data out of cache. That will lead to a "peloton" sort of effect, where the leader is slowed and the followers accelerated. iostat didn't show very much disk IO though.

Another possibility is that the jobs are fighting for memory cache (each is many Gb in size) and that somehow or other also syncs them.

My last guess is that the average run times in a given section of data may be fairly constant, and that with a bit of drift in some parts of the run they became synchronized by chance. The extent of synchronization seems too high though, around 6500 minutes half the jobs are ending at about the same time, and it was like that for around 1000 minutes.

Is this sort of thing common? What else could cause it?

System info: Dell PowerEdge T630, Centos 6.9, CPU Xeon E5-2650 as 2 CPUS with 10 cores/CPU and 2 threads/core for 40 "CPUs", NUMA with even cpus on node0 and odd on node1, 512Gb RAM, RAID5 with 4 disks for 11.7Tb.

Regards,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to