On Fri, Sep 18, 2020 at 4:31 PM Dan Liew <d...@su-root.co.uk> wrote: > > On Wed, 16 Sep 2020 at 22:24, David Blaikie <dblai...@gmail.com> wrote: > > > > I appreciate the value of the feature - but it's possible the test > > doesn't pull its weight. Is the code that implements the feature > > liable to failure/often touched? If it's pretty static/failure is > > unlikely, possibly the time and flaky failures aren't worth the value > > of possibly catching a low-chance bug. > > I don't have many data points here. I haven't needed to touch the code > or tests since 2018. I haven't been asked to review any patches > related to the timeout feature since then so I presume nobody has > touched it since. > > > > Another option might be to reduce how often/in which configurations > > the test is run - LLVM_ENABLE_EXPENSIVE_CHECKS presumably only works > > for code within LLVM itself, and not test cases - but maybe I'm wrong > > there & this parameter could be used (& then the timing bumped up > > quite a bit to try to make it much more reliable), or something > > similar could be implemented at the lit check level? > > > > Ah, compiler-rt tests use EXPENSIVE_CHECKS to disable certain tests: > > > > ./compiler-rt/test/lit.common.configured.in:set_default("expensive_checks", > > @LLVM_ENABLE_EXPENSIVE_CHECKS_PYBOOL@) > > ./compiler-rt/test/fuzzer/large.test:UNSUPPORTED: expensive_checks > > > > Could you bump the timeouts a fair bit and disable the tests except > > under expensive checks? > > Only running that test in the "expensive_checks" configuration kind of > abuses that configuration option because the test is actually not an > expensive check.
Ah, sorry - my point was to make it expensive: Make it long enough that it's more reliable even when the machine is under load. But making it long makes it expensive, so then classify it as such. > It's just that the test is sensitive to the > performance of the running system. If the system is under heavy load > when the test is running it's more likely to fail. > > One thing we could do to remove fragility in the test is to remove the > running of `short.py` in the test. This is only invoked to check that > it's possible for a command to run to completion in the presence of a > fixed timeout. If we can live without testing that part (i.e. we only > test that a timeout can be reached) then the test should be much more > robust. If you're on board with that, it's a tradeoff I think is probably reasonable from a test coverage V reliability V development time tradeoff. > If for some reason that's not enough another thing that might help > with this problem is to not run the lit tests at the same time as all > the other tests. The hope here is that the lit tests would run while > the system load is lower. We could do this by changing the build > system to > > * Make all `check-*` (except `check-lit`) targets depend on the > `check-lit` target. This would ensure that lit gets tested before we > test everything else in LLVM. > * Remove the lit test directory from inclusion in all `check-*` > targets (except `check-lit`) so we don't test lit twice. > > This also has a nice benefit of testing lit **before** we use it to > test everything else in LLVM. Today we don't do that, we just run all > the tests in one big lit invocation (i.e. we test lit at the same time > that we are using it). It does have some downsides though > > * If any of the lit tests fail we won't run the rest of the tests so > you won't get the results of running the other tests until the lit > tests are fixed. > * When running any of the `check-*` targets (apart from `check-lit`) > you have to wait for the lit tests to run before any other tests run. Reckon that's probably not the best way to go - wouldn't totally fix this bug (buildbot might be busy for other reasons (maybe it's running some regular maintenance, etc) or lit tests might still saturate cores & slow down the test) & would slow down test iteration by having another "pinch point" in the build graph - waiting for the last of the lit tests to run (this kind of period reduces core utilization - as the last few tests finish and no new work is scheduled) before starting up all the cores again to run the next batch of tests. - Dave > > What do you think? > > > On Wed, Sep 16, 2020 at 9:31 PM Dan Liew <d...@su-root.co.uk> wrote: > > > > > > Hi David, > > > > > > Unfortunately writing a reliable test is tricky given that the > > > functionality we're trying to test involves timing. I would advise > > > against disabling the test entirely because it actually tests > > > functionality that people use. I'd suggest bumping up the time limits. > > > This is what I've done in the past. See > > > > > > commit 6dfcc78364fa3e8104d6e6634733863eb0bf4be8 > > > Author: Dan Liew <d...@su-root.co.uk> > > > Date: Tue May 22 15:06:29 2018 +0000 > > > > > > [lit] Try to make `shtest-timeout.py` test more reliable by using a > > > larger timeout value. This really isn't very good because it will > > > still be susceptible to machine performance. > > > > > > While we are here also fix a bug in validation of > > > `maxIndividualTestTime` where previously it wasn't checked if the > > > type was an int. > > > > > > rdar://problem/40221572 > > > > > > llvm-svn: 332987 > > > > > > HTH, > > > Dan. > > > > > > On Wed, 16 Sep 2020 at 09:37, David Blaikie <dblai...@gmail.com> wrote: > > > > > > > > Ping on this > > > > > > > > On Wed, Sep 9, 2020 at 8:27 PM David Blaikie <dblai...@gmail.com> wrote: > > > > > > > > > > The clang-cmake-armv8-lld (linaro-toolchain owners) buildbot is > > > > > timing out trying to run some timeout tests (Dan Liew author): > > > > > > > > > > Pass: > > > > > http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5672 > > > > > Fail: > > > > > http://lab.llvm.org:8011/builders/clang-cmake-armv8-lld/builds/5673 > > > > > > > > > > Is there anything we can do to the buildbot? Or the tests? (bump up > > > > > the time limits or maybe remove the tests as unreliable?) _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain