Hi Gijs,
I worked last quarter on a project that allows us to trigger jobs accross 
revision ranges in various ways, trigger jobs multiple times or even create the 
missing builds to trigger a test job.
https://mozilla-ci-tools.readthedocs.org/en/latest

Some of the use cases:
https://mozilla-ci-tools.readthedocs.org/en/latest/use_cases.html

A basic example:
python scripts/trigger.py \
    --buildername "Rev5 MacOSX Yosemite 10.10 fx-team talos dromaeojs" \
    --rev e16054134e12 --back-revisions 10 --times 10

This tool does not fix everything but we can add anything that is missing.
This quarter I will be focusing on adding TaskCluster support (most trunk-b2g 
automation is now running there).

I have some comments below:

On Monday, 8 December 2014 15:52:40 UTC-5, Gijs Kruitbosch  wrote:
> Because I've been working on a few of them and here's what I think would 
> make them a lot easier to fix, and therefore improve our test coverage 
> and make sheriffs much happier
> 
> 
> 1) make it easier to figure out from bugzilla/treeherder when and where 
> the failure first occurred
> - I don't want to know the first thing that got reported to bmo - IME, 
> that is not always the first time it happened, just the first time it 
> got filed.
> 
> In other words, can I query treeherder in some way (we have structured 
> logs now right, and all this stuff is in a DB somewhere?) with a test 
> name and a regex, to have it tell me where the test first failed with a 
> message matching that regex?
> 

Structured logs are not everywhere but we could use them where available:
https://mozilla-ci-tools.readthedocs.org/en/latest/roadmap.html#determine-if-a-test-failed-in-a-job

> 2) make it easier to figure out from bugzilla/treeherder when and where 
> the failure happens
> 
> Linux only? Debug only? (non-)e10s only?
> 
> These questions are reasonably OK to answer right now by expanding all 
> the TBPL comments and using 'find in page'.
> 
> Harder questions to figure out are:
> 
> How often does this happen on which platform? Id est, more likely to 
> happen on debug, linux, asan, ... ? This helps with figuring out optimal 
> strategies to test fixes and/or regression hunt
> 
> I'm thinking a table with OS vs. debug/opt/asan/pgo vs. e10s/non-e10s 
> and numbers in the cells would already go a long way.
> 
We could scrape the comments with bugsy and generate a table/summary to at 
least help choosing the right target to trigger.

I don't know what the plans are for treeherder wrt to different views.

ActiveData could help in the future (this is what ahal mentioned in his reply 
IIUC):
https://wiki.mozilla.org/Auto-tools/Meetings/2015-03-23#.5BDONE.5D_Store_high-resolution_testcase_data_.28.22ActiveData.22.29_.5Bekyle.2C_ahal.5D


> 3) numbers on how frequently a test fails
> 
> "But we have this in orange-factor" I hear you say. Sure, but that tells 
> me how often it got starred, not a percentage ("failed 1% of the time on 
> Linux debug, 2% of the time on Windows 7 pgo, ..."), and so I can't know 
> how often to retrigger until I try. It also makes it hard to estimate 
> when the intermittent started being intermittent because it's rarely the 
> cset from (1) - given failure in 1 out of N runs, the likely regression 
> range is correlated with N (can't be bothered doing the exact 
> probability math right now).
> 
> This is an increasing problem because we run more and more jobs every 
> month, and so the threshold for annoyance for the sheriffs is getting 
> lower and lower.
> 
We could answer these with #1 or ActiveData.

> 4) automate regression hunting (aka mozregression for intermittent 
> infra-only failures)
> 
> see https://bugzilla.mozilla.org/show_bug.cgi?id=1099095 for an example 
> of how this works manually. We have APIs for retriggering now, right? We 
> have APIs for distinguishing relevant failures in logs from unrelated 
> orange, too. With the above, it should even be possible to narrow down 
> which platforms to retrigger on (I ended up just using winxp/7/linux 
> debug because they seemed most prominent, but I was too lazy to manually 
> create (2)), and how often to retrigger to get reasonable confidence in 
> ranges (3).
> 
> Right now, doing this manually costs me probably a full day or two of my 
> time to actually pore over results and such, with obviously a lot more 
> time spent waiting on the retriggers themselves. Automating this can 
> reduce this to 10 minutes of putting together the data and setting off 
> the requisite automation, plus it could theoretically strategize to run 
> retriggers at non-peak times.
> 
#4 is mainly what my work on Q1 helps with.

A quick-dirty approach would be to have a cron job with the jobs you want to 
trigger during the weekend or night. A treeherder URL is generated for the 
builders that you are triggering.

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to