Re: Consensus sought - when to reset try repository?

Gregory Szorc Tue, 04 Mar 2014 22:08:32 -0800

On 2/28/14, 5:24 PM, Hal Wine wrote:

tl;dr: what is the balance point between pushes to try taking too long
and loosing repository history of recent try pushes?


Summary:
--------

As most developers have experienced, pushing to try can sometimes take a
long time. Once it takes "too long" (as measured by screams of pain in
#releng) <https://etherpad.mozilla.org/ep/search?query=releng%29>, a
"try [repository] reset" is scheduled. This hurts productivity and
increases frustration for everyone involved (devs, IT, RelEng). We don't
want to do this anymore.

A reset of the try repository deletes the existing contents, and
replaces with a fresh clone from mozilla-central. While the tbpl
information will remain valid for any completed build, any attempt to
view the diffs for a try build will fail (unless you already had them in
your local repository).

Progress on resolution of the root cause:
-----------------------------------------

IT has made tremendous progress in reducing the occurrence of "long push
times", but they still are not predictable. Various attempts at
monitoring[1] and auto correction[2] have not been successful in
improving the situation. Work continues on additional changes that
should improve the situation[3].

The most recent mitigation strategy is to trade the "unknown timing"
disruption of the push times increasing to a pain threshold with a
"known timing" of reseting the try repository every TCW (tree closing
window - every 6 wks currently). However, we heard from some folks that
this is too often.

The most recent try-reset-triggered-by-pain was a duration of 6
months[4]. There was at least one report just 3 months after reset of
problems[5].

So, the question is - what say developers -- what's the balance point
between:
  - too often, making collaborating on try pushes hard
  - too infrequent, introducing increasing push times

I wouldn't have such a big issue with Try resets if we didn't loseinformation in the process. I believe every time there's been a Tryreset, I've lost data from a recent (<1 week) Try push and I needed tore-run that job - incurring extra cost to Mozilla and wasting my time. Ialso periodically find myself wanting to answer questions like "whatpercentage of tree closures are due to pushes that didn't go to Tryfirst." Data loss stinks.

I'd say the goal should be "no data loss." I have an idea that willenable us to achieve this.

Let's expose every newly-reset instance of the Try repo as a separateURL. We would still push to ssh://hg.mozilla.org/try, but the URLsprinted and the URLs used by automation would be URLs to repos thatwould never go away. e.g.https://hg.mozilla.org/tries/try1/rev/840f122d1286 ("try1" being theimportant bit in there). When we reset Try, you'd hand out URLs to"try2." You could reset the writable Try repo as frequently as youdesired and aside from a slightly different repo URL being given out,nobody should notice.

The main drawbacks of this approach that I can think of are all inautomation: parts of automation are very repo/URL centric and havingeffectively dynamic URLs might break assumptions. But making automationwork against arbitrary URLs is a good thing, as it allows automation tobe more flexible and this allows people to experiment with alternaterepo hosting, landing tools, landing-integrated code review tools, etcwithout requiring special involvement from RelEng. "Everything is a webservice and is self-service," etc.

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Consensus sought - when to reset try repository?

Reply via email to