As part of a larger effort to improve the experience around debugging
intermittents, I've been looking at reducing the time it takes for
common "try" workloads for developers (so that e.g. retriggering a job
to reproduce a failure can happen faster).
Of course, the common advice of "profile before you optimize" applies
here. That is to say, we want to spend time optimizing what makes this
particular task painful, rather than optimizing try in general.
To start with, I've been using a manually generated dump of the jobs
posted to try on treeherder in a 10 day span. You can see some
preliminary results of me manipulating the data to get some rough
information (last job for each try push, and total machine time for each
job) here:
http://nbviewer.jupyter.org/url/people.mozilla.org/%7Ewlachance/try%20analysis.ipynb
Before I go much further, I'd love to know if anyone has done similar
explorations in the recent past and if there are any aggregation points
other than Treeherder (taskcluster? buildbot?) which are storing this
information: my working assumption is that treeherder is the best place
to analyze this information (since it acts as a central point of
aggregation), but I'm open to being proven wrong.
Also, accounts of specific try workloads of this type which are
annoying/painful would be helpful. :) I think I have a rough idea of the
particular type of try push I'm looking for (not pushed by release
operations, at least one retrigger) but it would be great to get
firsthand confirmation of that.
Will
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform