On 4/3/2013 6:33 PM, jmaher wrote:
I looked at the data used to calculate the offenders, and I found:

total type, total jobs, total duration, total hours
try builders, 3525, 12239477, 3399.85472222
try testers, 71821, 121294315, 33692.8652778
inbound builders, 7862, 30877533, 8577.0925
inbound testers, 121641, 182883638, 50801.0105556
other builders, 14690, 26990702, 7497.41722222
other testers, 75170, 111729324, 31035.9233333
totals: 294709, 486014989.0, 135004.163611


The sheriffs and releng and I have been talking about this problem for the last month or two, knowing that we were running way in the red. We have a bunch of solutions, but we hadn't yet crunched the numbers to see what our best solution is on the way forward. Our best solution is certainly going to be some combination of process change combined with some amount of technical optimizations. But what we focus on when is the million dollar question.

Joel and I did some calculations:
* 200 pushes/day[1]
* 325 test jobs/push
* 25 builds/push
* .41 hours/test (on average, from above numbers)
* 1.1 hours/build (on average, based on try values from above)

Then you can approximate what the load of Kat's suggestion would look like: 200pushes/day * ((325test/push * .41hrs/test) + (25builds/push * 1.1 hrs/bld)) = 32150hrs/day

So we need 32150 compute hours per day to keep up.
If you see above our totals for the week of data that gps provided us with you can see that we are currently running at: 135004hours/week / 7days = 19286 compute hours/day

So, while I really like Kat's proposal from a sheriffing standpoint, my back of the napkin math here makes me worry that our current infrastructure can't support it.

The only way I see to do something like this approach would be to batch patches together in order to reduce the number of pushes, or to run tests intermittently like both dbaron and jmaher mentioned.

We need to figure out what we can do to make this better--it's a terrible problem. We're running way in the red, and we do need to see what we can do to be more efficient with the infrastructure we've got while at the same time finding ways to expand it. There are expansion efforts underway but expanding the physical infrastructure is not a short term project. This is on our goals (both releng and ateam) for Q2, and any help crunching numbers to understand our most effective path forward is appreciated.

Clint
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to