Thank you Josh.

“I think it would be helpful if we always ran the repeated test jobs at
CircleCI when we add a new test or modify an existing one. Running those
jobs, when applicable, could be a requirement before committing. This
wouldn't help us when the changes affect many different tests or we are not
able to identify the tests affected by our changes, but I think it could
have prevented many of the recently fixed flakies.“

I think I would love also to see the verification with running new tests in
a loop before adding them to the code happening more often. It was
mentioned by a few of us in this discussion as a good method we already use
successfully so I just wanted to mention it again so it doesn’t slip out of
the list. :-)

Happy weekend everyone!

Best regards,
Ekaterina


On Fri, 5 Nov 2021 at 11:30, Joshua McKenzie <jmcken...@apache.org> wrote:

> To checkpoint this conversation and keep it going, the ideas I see
> in-thread (light editorializing by me):
> 1. Blocking PR merge on CI being green (viable for single branch commits,
> less so for multiple)
> 2. A change in our expected culture of "if you see something, fix
> something" when it comes to test failures on a branch (requires stable
> green test board to be viable)
> 3. Clearer merge criteria and potentially updates to circle config for
> committers in terms of "which test suites need to be run" (notably,
> including upgrade tests)
> 4. Integration of model and property based fuzz testing into the release
> qualification pipeline at least
> 5. Improvements in project dependency management, most notably in-jvm dtest
> API's, and the release process around that
>
> So a) Am I missing anything, and b) Am I getting anything wrong in the
> summary above?
>
> On Thu, Nov 4, 2021 at 9:01 AM Andrés de la Peña <adelap...@apache.org>
> wrote:
>
> > Hi all,
> >
> > we already have a way to confirm flakiness on circle by running the test
> > > repeatedly N times. Like 100 or 500. That has proven to work very well
> > > so far, at least for me. #collaborating #justfyi
> >
> >
> > I think it would be helpful if we always ran the repeated test jobs at
> > CircleCI when we add a new test or modify an existing one. Running those
> > jobs, when applicable, could be a requirement before committing. This
> > wouldn't help us when the changes affect many different tests or we are
> not
> > able to identify the tests affected by our changes, but I think it could
> > have prevented many of the recently fixed flakies.
> >
> >
> > On Thu, 4 Nov 2021 at 12:24, Joshua McKenzie <jmcken...@apache.org>
> wrote:
> >
> > > >
> > > > we noticed CI going from a
> > > > steady 3-ish failures to many and it's getting fixed. So we're moving
> > in
> > > > the right direction imo.
> > > >
> > > An observation about this: there's tooling and technology widely in use
> > to
> > > help prevent ever getting into this state (to Benedict's point:
> blocking
> > > merge on CI failure, or nightly tests and reverting regression commits,
> > > etc). I think there's significant time and energy savings for us in
> using
> > > automation to be proactive about the quality of our test boards rather
> > than
> > > reactive.
> > >
> > > I 100% agree that it's heartening to see that the quality of the
> codebase
> > > is improving as is the discipline / attentiveness of our collective
> > > culture. That said, I believe we still have a pretty fragile system
> when
> > it
> > > comes to test failure accumulation.
> > >
> > > On Thu, Nov 4, 2021 at 2:46 AM Berenguer Blasi <
> berenguerbl...@gmail.com
> > >
> > > wrote:
> > >
> > > > I agree with David. CI has been pretty reliable besides the random
> > > > jenkins going down or timeout. The same 3 or 4 tests were the only
> > flaky
> > > > ones in jenkins and Circle was very green. I bisected a couple
> failures
> > > > to legit code errors, David is fixing some more, others have as well,
> > etc
> > > >
> > > > It is good news imo as we're just getting to learn our CI post 4.0 is
> > > > reliable and we need to start treating it as so and paying attention
> to
> > > > it's reports. Not perfect but reliable enough it would have prevented
> > > > those bugs getting merged.
> > > >
> > > > In fact we're having this conversation bc we noticed CI going from a
> > > > steady 3-ish failures to many and it's getting fixed. So we're moving
> > in
> > > > the right direction imo.
> > > >
> > > > On 3/11/21 19:25, David Capwell wrote:
> > > > >> It’s hard to gate commit on a clean CI run when there’s flaky
> tests
> > > > > I agree, this is also why so much effort was done in 4.0 release to
> > > > remove as much as possible.  Just over 1 month ago we were not really
> > > > having a flaky test issue (outside of the sporadic timeout issues; my
> > > > circle ci runs were green constantly), and now the “flaky tests” I
> see
> > > are
> > > > all actual bugs (been root causing 2 out of the 3 I reported) and
> some
> > > (not
> > > > all) of the flakyness was triggered by recent changes in the past
> > month.
> > > > >
> > > > > Right now people do not believe the failing test is caused by their
> > > > patch and attribute to flakiness, which then causes the builds to
> start
> > > > being flaky, which then leads to a different author coming to fix the
> > > > issue; this behavior is what I would love to see go away.  If we
> find a
> > > > flaky test, we should do the following
> > > > >
> > > > > 1) has it already been reported and who is working to fix?  Can we
> > > block
> > > > this patch on the test being fixed?  Flaky tests due to timing issues
> > > > normally are resolved very quickly, real bugs take longer.
> > > > > 2) if not reported, why?  If you are the first to see this issue
> than
> > > > good chance the patch caused the issue so should root cause.  If you
> > are
> > > > not the first to see it, why did others not report it (we tend to be
> > good
> > > > about this, even to the point Brandon has to mark the new tickets as
> > > dups…)?
> > > > >
> > > > > I have committed when there were flakiness, and I have caused
> > > flakiness;
> > > > not saying I am perfect or that I do the above, just saying that if
> we
> > > all
> > > > moved to the above model we could start relying on CI.  The biggest
> > > impact
> > > > to our stability is people actually root causing flaky tests.
> > > > >
> > > > >>  I think we're going to need a system that
> > > > >> understands the difference between success, failure, and timeouts
> > > > >
> > > > > I am curious how this system can know that the timeout is not an
> > actual
> > > > failure.  There was a bug in 4.0 with time serialization in message,
> > > which
> > > > would cause the message to get dropped; this presented itself as a
> > > timeout
> > > > if I remember properly (Jon Meredith or Yifan Cai fixed this bug I
> > > believe).
> > > > >
> > > > >> On Nov 3, 2021, at 10:56 AM, Brandon Williams <dri...@gmail.com>
> > > wrote:
> > > > >>
> > > > >> On Wed, Nov 3, 2021 at 12:35 PM bened...@apache.org <
> > > > bened...@apache.org> wrote:
> > > > >>> The largest number of test failures turn out (as pointed out by
> > > David)
> > > > to be due to how arcane it was to trigger the full test suite.
> > Hopefully
> > > we
> > > > can get on top of that, but I think a significant remaining issue is
> a
> > > lack
> > > > of trust in the output of CI. It’s hard to gate commit on a clean CI
> > run
> > > > when there’s flaky tests, and it doesn’t take much to misattribute
> one
> > > > failing test to the existing flakiness (I tend to compare to a run of
> > the
> > > > trunk baseline for comparison, but this is burdensome and still error
> > > > prone). The more flaky tests there are the more likely this is.
> > > > >>>
> > > > >>> This is in my opinion the real cost of flaky tests, and it’s
> > probably
> > > > worth trying to crack down on them hard if we can. It’s possible the
> > > > Simulator may help here, when I finally finish it up, as we can port
> > > flaky
> > > > tests to run with the Simulator and the failing seed can then be
> > explored
> > > > deterministically (all being well).
> > > > >> I totally agree that the lack of trust is a driving problem here,
> > even
> > > > >> in knowing which CI system to rely on. When Jenkins broke but
> Circle
> > > > >> was fine, we all assumed it was a problem with Jenkins, right up
> > until
> > > > >> Circle also broke.
> > > > >>
> > > > >> In testing a distributed system like this I think we're always
> going
> > > > >> to have failures, even on non-flaky tests, simply because the
> > > > >> underlying infrastructure is variable with transient failures of
> its
> > > > >> own (the network is reliable!)  We can fix the flakies where the
> > fault
> > > > >> is in the code (and we've done this to many already) but to get
> more
> > > > >> trustworthy output, I think we're going to need a system that
> > > > >> understands the difference between success, failure, and timeouts,
> > and
> > > > >> in the latter case knows how to at least mark them differently.
> > > > >> Simulator may help, as do the in-jvm dtests, but there is
> ultimately
> > > > >> no way to cover everything without doing some things the hard,
> more
> > > > >> realistic way where sometimes shit happens, marring the
> > almost-perfect
> > > > >> runs with noisy doubt, which then has to be sifted through to
> > > > >> determine if there was a real issue.
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > >>
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > > >
> > >
> >
>

Reply via email to