Re: ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears

2012-03-23 Thread Thomas van Neerijnen
The main issue turned out to be a bug in our code whereby we were writing a lot of new columns to the same row key instead of a new row key, turning what we expected to be a skinny rowed CF into a CF with one very, very wide row. These writes on the single key were putting pressure on the 3 nodes h

Re: ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears

2012-03-21 Thread Thomas van Neerijnen
Hi I'm going with yes to all three of your questions. I found a very heavily hit index which we have since reworked to remove the secondry index entirely. This fixed a large portion of the problem but during the panic of the overloaded cluster we did the simple scaling out trick of doubling the c

Re: ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears

2012-03-21 Thread aaron morton
The node is overloaded with hints. I'll just grab the comments from code… // avoid OOMing due to excess hints. we need to do this check even for "live" nodes, since we can // still generate hints for those if it's overloaded or simply dead but not yet known-to-be-dead

ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears

2012-03-21 Thread Thomas van Neerijnen
Hi all I'm running into a weird error on Cassandra 1.0.7. As my clusters load gets heavier many of the nodes seem to hit the same error around the same time, resulting in MutationStage backing up and never clearing down. The only way to recover the cluster is to kill all the nodes and start them u