Hi Gijs, I worked last quarter on a project that allows us to trigger jobs accross revision ranges in various ways, trigger jobs multiple times or even create the missing builds to trigger a test job. https://mozilla-ci-tools.readthedocs.org/en/latest
Some of the use cases: https://mozilla-ci-tools.readthedocs.org/en/latest/use_cases.html A basic example: python scripts/trigger.py \ --buildername "Rev5 MacOSX Yosemite 10.10 fx-team talos dromaeojs" \ --rev e16054134e12 --back-revisions 10 --times 10 This tool does not fix everything but we can add anything that is missing. This quarter I will be focusing on adding TaskCluster support (most trunk-b2g automation is now running there). I have some comments below: On Monday, 8 December 2014 15:52:40 UTC-5, Gijs Kruitbosch wrote: > Because I've been working on a few of them and here's what I think would > make them a lot easier to fix, and therefore improve our test coverage > and make sheriffs much happier > > > 1) make it easier to figure out from bugzilla/treeherder when and where > the failure first occurred > - I don't want to know the first thing that got reported to bmo - IME, > that is not always the first time it happened, just the first time it > got filed. > > In other words, can I query treeherder in some way (we have structured > logs now right, and all this stuff is in a DB somewhere?) with a test > name and a regex, to have it tell me where the test first failed with a > message matching that regex? > Structured logs are not everywhere but we could use them where available: https://mozilla-ci-tools.readthedocs.org/en/latest/roadmap.html#determine-if-a-test-failed-in-a-job > 2) make it easier to figure out from bugzilla/treeherder when and where > the failure happens > > Linux only? Debug only? (non-)e10s only? > > These questions are reasonably OK to answer right now by expanding all > the TBPL comments and using 'find in page'. > > Harder questions to figure out are: > > How often does this happen on which platform? Id est, more likely to > happen on debug, linux, asan, ... ? This helps with figuring out optimal > strategies to test fixes and/or regression hunt > > I'm thinking a table with OS vs. debug/opt/asan/pgo vs. e10s/non-e10s > and numbers in the cells would already go a long way. > We could scrape the comments with bugsy and generate a table/summary to at least help choosing the right target to trigger. I don't know what the plans are for treeherder wrt to different views. ActiveData could help in the future (this is what ahal mentioned in his reply IIUC): https://wiki.mozilla.org/Auto-tools/Meetings/2015-03-23#.5BDONE.5D_Store_high-resolution_testcase_data_.28.22ActiveData.22.29_.5Bekyle.2C_ahal.5D > 3) numbers on how frequently a test fails > > "But we have this in orange-factor" I hear you say. Sure, but that tells > me how often it got starred, not a percentage ("failed 1% of the time on > Linux debug, 2% of the time on Windows 7 pgo, ..."), and so I can't know > how often to retrigger until I try. It also makes it hard to estimate > when the intermittent started being intermittent because it's rarely the > cset from (1) - given failure in 1 out of N runs, the likely regression > range is correlated with N (can't be bothered doing the exact > probability math right now). > > This is an increasing problem because we run more and more jobs every > month, and so the threshold for annoyance for the sheriffs is getting > lower and lower. > We could answer these with #1 or ActiveData. > 4) automate regression hunting (aka mozregression for intermittent > infra-only failures) > > see https://bugzilla.mozilla.org/show_bug.cgi?id=1099095 for an example > of how this works manually. We have APIs for retriggering now, right? We > have APIs for distinguishing relevant failures in logs from unrelated > orange, too. With the above, it should even be possible to narrow down > which platforms to retrigger on (I ended up just using winxp/7/linux > debug because they seemed most prominent, but I was too lazy to manually > create (2)), and how often to retrigger to get reasonable confidence in > ranges (3). > > Right now, doing this manually costs me probably a full day or two of my > time to actually pore over results and such, with obviously a lot more > time spent waiting on the retriggers themselves. Automating this can > reduce this to 10 minutes of putting together the data and setting off > the requisite automation, plus it could theoretically strategize to run > retriggers at non-peak times. > #4 is mainly what my work on Q1 helps with. A quick-dirty approach would be to have a cron job with the jobs you want to trigger during the weekend or night. A treeherder URL is generated for the builders that you are triggering. _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform