There's a couple more points to be made here, I think. First, we've also gone to a great deal of effort to make upgrading seamless, and we recently (1.0.3) added support for seamless downgrading as well. Anyone with a staging cluster (which should be everyone) can drop 1.0.4 on a single node, see if there's any problems, and roll back to 1.0.3 if there are. Which is, as near as I can tell, what happened. Granted, it's always better to not release bugs at all, but it happens to *everyone*, so defense in depth is a Good Thing.
Second, I don't think you're going to be able to mandate more prerelease community testing. It's kind of a law of nature that the bulk of testing happens post-release, whether in databases, web frameworks, or OS kernels, new projects or mature, you see the same pattern everywhere. (A big thank you to everyone who *did* test prerelease 1.0.x artifacts, you guys are awesome!) IMO the best we can do is get more automated coverage of the distributed side of things. We've had a framework for this in-tree for a while but it's so incredibly painful to actually write tests for that we only have a handful. DataStax has been working on a next-gen dtest framework to improve this situation -- Sylvain just posted about that, so I'll defer to that thread now. On Tue, Nov 29, 2011 at 5:16 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com> wrote: > I'd like to start a discussion about ideas to improve release quality for > Cassandra. Specifically I wonder if the community can do more to help the > project as a whole become more solid. Cassandra has an active and vibrant > community using Cassandra for a variety of things. If we all pitch in a > little bit, it seems like we can make a difference here. > > Release quality is difficult, especially for a distributed system like > Cassandra. The core devs have done an amazing job with this considering how > complicated it is. Currently, there are several things in place to make sure > that a release is generally usable: > - review-then-commit > - 72 hour voting period > - at least 3 binding +1 votes > - unit tests > - integration tests > Then there is the personal responsibility aspect - testing a release in a > staging environment before pushing it to production. > > I wonder if more could be done here to give more confidence in releases. I > wanted to see if there might be ways that the community could help out > without being too burdensome on either the core devs or the community. > > Some ideas: > More automation: run YCSB and stress with various setups. Maybe people can > rotate donating cloud instances (or simply money for them) but have a common > set of scripts to do this in the source. > > Dedicated distributed test suite: I know there has been work done on various > distributed test suites (which is great!) but none have really caught on so > far. > > I know what the apache guidelines say, but what if the community could help > out with the testing effort in a more formal way. For example, for each > release to be finalized, what if there needed to be 3 community members that > needed to try it out in their own environment? > > What if there was a post release +1 vote for the community to sign off on - > sort of a "works for me" kind of thing to reassure others that it's safe to > try. So when the release email gets posted to the user list, start a > tradition of people saying +1 in reply if they've tested it out and it works > for them. That's happening informally now when there are problems, but it > might be nice to see a vote of confidence. Just another idea. > > Any other ideas or variations? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com