Re: How to reduce the time m-i is closed?

David Burns Wed, 20 Nov 2013 13:28:43 -0800

I think the crux of this is to reduce the time m-i is closed is to havebetter monitoring on things like the memory during testing so we can seeif it is growing during tests or change the tests that we don't careabout that assert or hide the test on TBPL so the sheriffs ignore it.

The best answer is for the first thing and add more monitoring so wecatch this sooner.

The one thing I think we should note is we shouldnt push the sheriffs tore-open the tree when its in a bad state. The sheriffs never takeclosing the tree lightly but if something needs to be landed urgentlyyou can always add checkin-needed keyword to the bug and the sheriffswill land it for you ASAP.


David



On 20/11/2013 16:20, Robert Kaiser wrote:

Nicholas Nethercote schrieb:
It also assumes that we can backout stuff to fix
the problem;  we tried that to some extent with the first OOM closure
-- it is the standard response to test failure, of course -- but it
didn't work.
Yes, in the case of those OOM issues that caused this closure, theyare probably just a symptom of a larger problem.
We've been having a step-by-step rise of OOM issues over quite sometime now, most intensely seen as an increase of crashes with emptydumps. I alerted to that in bug 837835 but we couldn't track down adecent regression range (we mostly know in which 6-week cycle we hadregressions, we can do some assumptions to narrow things a bit furtherdown on trunk, but not nearly well enough to get to candidatecheckins). Because of that, this has been lingering without any realtries to fix things, and from what I saw in data, things did even getworse recently - and that's talking of the release channel, sowhatever might have increased troubles on trunk around this closure iseven on top of that.
As in a lot of cases we're seeing, there's apparently too littlememory available for Windows to even create a minidump, we have prettylittle info about those issues - but we do have our additionalannotations we send along with the crash report, and those gives ussome info that AFAIK gives us the assumption that in many cases we'rerunning out of virtual memory space but not necessarily of physicalmemory. As I'm told, that can for example happen with VM fragmentationas well as bugs causing a mapping of the same physical page over andover into virtual memory. We're not sure if that's all on our code orif system code or (graphics?) driver code exposes issues to us there.
From what I know, bsmedberg and dmajor are looking into those issuesmore closely, both now that we had the tree closure problem and alsobecause it has been a lingering stability issue for months. I'm sureany help in those efforts is appreciated as those are tough issues,and it might be multiple problems that all contribute a share to theoverall issue.
Making us more efficient on memory sounds like a worthwhile goaloverall anyhow (even though the bullet of running out of VM space canbe dodged by switching to Win64 and/or e10s giving us multipleprocesses that all have their 32bit virtual memory space, but not sureif those should or will be our primary solutions).
I think in other cases, where a clear cause to the tree-closing issuesis easy to assess, a backout-based process can work better, but withthose OOM issues there's not a clear patch or patch set to point to.IMHO, we should work on the overall issue cluster of OOM, though.
KaiRo
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: How to reduce the time m-i is closed?

Reply via email to