On 12/8/2010 7:30 AM, Jonathan Ellis wrote:
> On Tue, Dec 7, 2010 at 4:00 PM, Reverend Chip wrote:
>> Full DEBUG level logs would be a space problem; I'm loading at least 1T
>> per node (after 3x replication), and these events are rare. Can the
>> DEBUG logs be limit
st 1T
per node (after 3x replication), and these events are rare. Can the
DEBUG logs be limited to the specific modules helpful for this diagnosis
of the gossip problem and, secondarily, the failure to report
replication failure?
> On Tue, Dec 7, 2010 at 2:37 PM, Reverend Chip wrote:
>> No,
e/CASSANDRA-1804 which is fixed in
> rc2.
>
> On Mon, Dec 6, 2010 at 6:58 PM, Reverend Chip wrote:
>> I'm running a big test -- ten nodes with 3T disk each. I'm using
>> 0.7.0rc1. After some tuning help (thanks Tyler) lots of this is working
>> as it should. H
I'm running a big test -- ten nodes with 3T disk each. I'm using
0.7.0rc1. After some tuning help (thanks Tyler) lots of this is working
as it should. However a serious event occurred as well -- the server
froze up -- and though mutations were dropped, no error was reported to
the client. Here'
On 11/15/2010 2:01 PM, Jonathan Ellis wrote:
> On Mon, Nov 15, 2010 at 3:05 PM, Reverend Chip wrote:
>>
>> There are a lot of non-tmps that were not included in the load
>> figure. Having stopped the server and deleted tmp files, the data are
>> still using way mor
On 11/15/2010 12:09 PM, Jonathan Ellis wrote:
> On Mon, Nov 15, 2010 at 1:03 PM, Reverend Chip wrote:
>> I find X.21's data disk is full. "nodetool ring" says that X.21 has a
>> load of only 326.2 GB, but the 1T partition is full.
> Load only tracks live data --
On 11/15/2010 12:13 PM, Rob Coli wrote:
> On 11/15/10 12:08 PM, Reverend Chip wrote:
>>> "
>>> logger_.warn("Unable to lock JVM memory (ENOMEM)."
>>> or
>>> logger.warn("Unknown mlockall error " + errno(e));
>>> "
>
On 11/15/2010 11:34 AM, Rob Coli wrote:
> On 11/13/10 11:59 AM, Reverend Chip wrote:
>> Swapping could conceivably be a
>> factor; the JVM is 32G out of 72G, but the machine is 2.5G into swap
>> anyway. I'm going to disable swap and see if the gossip issues resolve.
>
On 11/15/2010 10:30 AM, Jonathan Ellis wrote:
> Is X.20 spewing these errors constantly now?
Yes.
> Did X.21 log anything when/before the errors started on X.20?
I find X.21's data disk is full. "nodetool ring" says that X.21 has a
load of only 326.2 GB, but the 1T partition is full.
When I tra
Did I answer the question sufficiently? I need repair to work, and the
cluster is sick.
On 11/14/2010 2:17 PM, Jonathan Ellis wrote:
> What exception is causing it to fail/retry?
>
> On Sun, Nov 14, 2010 at 3:49 PM, Chip Salzenberg wrote:
>> My by-now infamous eight-node cluster running 0.7.0bet
streaming,
> both source and target
> - DEBUG level logs
> - instructions for how to reproduce
>
> On Thu, Nov 11, 2010 at 7:46 PM, Reverend Chip wrote:
>> I've been running tests with a first four-node, then eight-node
>> cluster. I started with 0.7.0 beta3, but ha
On 11/12/2010 6:46 PM, Jonathan Ellis wrote:
> On Fri, Nov 12, 2010 at 3:19 PM, Chip Salzenberg wrote:
>> After I rebooted my 0.7.0beta3+ cluster to increase threads (read=100
>> write=200 ... they're beefy machines), and putting them under load again, I
>> find gossip reporting yoyo up-down-up-do
I've been running tests with a first four-node, then eight-node
cluster. I started with 0.7.0 beta3, but have since updated to a more
recent Hudson build. I've been happy with a lot of things, but I've had
some really surprisingly unpleasant experiences with operational fragility.
For example, w
On 11/6/2010 8:26 PM, Jonathan Ellis wrote:
> On Sat, Nov 6, 2010 at 4:51 PM, Reverend Chip wrote:
>> Am I to understand that
>> ring maintenance requests can just fail when partially complete, in the
>> same manner as a regular insert might fail, perhaps due to inter-node
&
More weirdness with my four-or-five-node cluster of 0.7 beta3. Having
brought up all five nodes, including the one that didn't loadbalance
right, I tried loadbalancing it again. (This is under completely idle
conditions - no external reads or writes.) The result is a cluster
where each node thi
On 11/6/2010 1:48 PM, Jonathan Ellis wrote:
> On Fri, Nov 5, 2010 at 8:03 PM, Chip Salzenberg wrote:
>> In the below "nodetool ring" output, machine 18 was told to loadbalance over
>> an hour ago. It won't actually leave the ring. When I first told it to
>> loadbalance, the cluster was under hea
16 matches
Mail list logo