Re: Cluster fragility

2010-11-13 Thread Reverend Chip
OK. Reconstructing the past failures is impractical, but I'm prepared for next time. On 11/12/2010 6:38 PM, Jonathan Ellis wrote: > These are not expected. In order of increasing utility of fixing it > we could use > > - INFO level logs from when something went wrong; when streaming, > both sou

Re: Cluster fragility

2010-11-12 Thread Jonathan Ellis
These are not expected. In order of increasing utility of fixing it we could use - INFO level logs from when something went wrong; when streaming, both source and target - DEBUG level logs - instructions for how to reproduce On Thu, Nov 11, 2010 at 7:46 PM, Reverend Chip wrote: > I've been r

Re: Cluster fragility

2010-11-12 Thread Dave Gardner
We never have to reboot our production cluster. However we're not running a beta version but a release version (0.6.6). If your aim is to avoid fragility, it would seem sensible to run a release version as a good starting point. dave On Friday, November 12, 2010, Reverend Chip wrote: > I've been

Cluster fragility

2010-11-11 Thread Reverend Chip
I've been running tests with a first four-node, then eight-node cluster. I started with 0.7.0 beta3, but have since updated to a more recent Hudson build. I've been happy with a lot of things, but I've had some really surprisingly unpleasant experiences with operational fragility. For example, w