There's a couple more points to be made here, I think.

First, we've also gone to a great deal of effort to make upgrading
seamless, and we recently (1.0.3) added support for seamless
downgrading as well.  Anyone with a staging cluster (which should be
everyone) can drop 1.0.4 on a single node, see if there's any
problems, and roll back to 1.0.3 if there are.  Which is, as near as I
can tell, what happened.  Granted, it's always better to not release
bugs at all, but it happens to *everyone*, so defense in depth is a
Good Thing.

Second, I don't think you're going to be able to mandate more
prerelease community testing.  It's kind of a law of nature that the
bulk of testing happens post-release, whether in databases, web
frameworks, or OS kernels, new projects or mature, you see the same
pattern everywhere.  (A big thank you to everyone who *did* test
prerelease 1.0.x artifacts, you guys are awesome!)

IMO the best we can do is get more automated coverage of the
distributed side of things.  We've had a framework for this in-tree
for a while but it's so incredibly painful to actually write tests for
that we only have a handful.  DataStax has been working on a next-gen
dtest framework to improve this situation -- Sylvain just posted about
that, so I'll defer to that thread now.

On Tue, Nov 29, 2011 at 5:16 PM, Jeremy Hanna
<jeremy.hanna1...@gmail.com> wrote:
> I'd like to start a discussion about ideas to improve release quality for 
> Cassandra.  Specifically I wonder if the community can do more to help the 
> project as a whole become more solid.  Cassandra has an active and vibrant 
> community using Cassandra for a variety of things.  If we all pitch in a 
> little bit, it seems like we can make a difference here.
>
> Release quality is difficult, especially for a distributed system like 
> Cassandra.  The core devs have done an amazing job with this considering how 
> complicated it is.  Currently, there are several things in place to make sure 
> that a release is generally usable:
> - review-then-commit
> - 72 hour voting period
> - at least 3 binding +1 votes
> - unit tests
> - integration tests
> Then there is the personal responsibility aspect - testing a release in a 
> staging environment before pushing it to production.
>
> I wonder if more could be done here to give more confidence in releases.  I 
> wanted to see if there might be ways that the community could help out 
> without being too burdensome on either the core devs or the community.
>
> Some ideas:
> More automation: run YCSB and stress with various setups.  Maybe people can 
> rotate donating cloud instances (or simply money for them) but have a common 
> set of scripts to do this in the source.
>
> Dedicated distributed test suite: I know there has been work done on various 
> distributed test suites (which is great!) but none have really caught on so 
> far.
>
> I know what the apache guidelines say, but what if the community could help 
> out with the testing effort in a more formal way.  For example, for each 
> release to be finalized, what if there needed to be 3 community members that 
> needed to try it out in their own environment?
>
> What if there was a post release +1 vote for the community to sign off on - 
> sort of a "works for me" kind of thing to reassure others that it's safe to 
> try.  So when the release email gets posted to the user list, start a 
> tradition of people saying +1 in reply if they've tested it out and it works 
> for them.  That's happening informally now when there are problems, but it 
> might be nice to see a vote of confidence.  Just another idea.
>
> Any other ideas or variations?



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to