Re: Node OOM Problems

2010-08-23 Thread Wayne
Thanks for the confirmation this is NOT the way to go. I will stick with 4 disks raid 0 with a single data directory. On Mon, Aug 23, 2010 at 9:24 PM, Rob Coli wrote: > On 8/22/10 12:00 AM, Wayne wrote: > >> Due to compaction being so expensive in terms of disk resources, does it >> make more s

Re: Node OOM Problems

2010-08-23 Thread Rob Coli
On 8/22/10 12:00 AM, Wayne wrote: Due to compaction being so expensive in terms of disk resources, does it make more sense to have 2 data volumes instead of one? We have 4 data disks in raid 0, would this make more sense to be 2 x 2 disks in raid 0? That way the reader and writer I assume would a

Re: Node OOM Problems

2010-08-22 Thread Benjamin Black
On Sun, Aug 22, 2010 at 2:03 PM, Wayne wrote: > From a testing whether cassandra can take the load long term I do not see it > as different. Yes bulk loading can be made faster using very different Then you need far more IO, whether it comes form faster drives or more nodes. If you can achieve 1

Re: Node OOM Problems

2010-08-22 Thread Wayne
>From a testing whether cassandra can take the load long term I do not see it as different. Yes bulk loading can be made faster using very different methods, but my purpose is to test cassandra with a large volume of writes (and not to bulk load as efficiently as possible). I have scaled back to 5

Re: Node OOM Problems

2010-08-22 Thread Benjamin Black
Wayne, Bulk loading this much data is a very different prospect from needing to sustain that rate of updates indefinitely. As was suggested earlier, you likely need to tune things differently, including disabling minor compactions during the bulk load, to make this work efficiently. b On Sun,

Re: Node OOM Problems

2010-08-22 Thread Wayne
Has anyone loaded 2+ terabytes of real data in one stretch into a cluster without bulk loading and without any problems? How long did it take? What kind of nodes were used? How many writes/sec/node can be sustained for 24+ hours? On Sun, Aug 22, 2010 at 8:22 PM, Peter Schuller wrote: > I only

Re: Node OOM Problems

2010-08-22 Thread Peter Schuller
I only sifted recent history of this thread (for time reasons), but: > You have started a major compaction which is now competing with those > near constant minor compactions for far too little I/O (3 SATA drives > in RAID0, perhaps?).  Normally, this would result in a massive > ballooning of your

Re: Node OOM Problems

2010-08-22 Thread Benjamin Black
Is the need for 10k/sec/node just for bulk loading of data or is it how your app will operate normally? Those are very different things. On Sun, Aug 22, 2010 at 4:11 AM, Wayne wrote: > Currently each node has 4x1TB SATA disks. In MySQL we have 15tb currently > with no replication. To move this t

Re: Node OOM Problems

2010-08-22 Thread Edward Capriolo
On Sun, Aug 22, 2010 at 7:11 AM, Wayne wrote: > Currently each node has 4x1TB SATA disks. In MySQL we have 15tb currently > with no replication. To move this to Cassandra replication factor 3 we need > 45TB assuming the space usage is the same, but it is probably more. We had > assumed a 30 node c

Re: Node OOM Problems

2010-08-22 Thread Wayne
Currently each node has 4x1TB SATA disks. In MySQL we have 15tb currently with no replication. To move this to Cassandra replication factor 3 we need 45TB assuming the space usage is the same, but it is probably more. We had assumed a 30 node cluster with 4tb per node would suffice with head room f

Re: Node OOM Problems

2010-08-22 Thread Benjamin Black
I see no reason to make that assumption. Cassandra currently has no mechanism to alternate in that manner. At the update rate you require, you just need more disk io (bandwidth and iops). Alternatively, you could use a bunch more, smaller nodes with the same SATA RAID setup so they each take many

Re: Node OOM Problems

2010-08-22 Thread Wayne
Due to compaction being so expensive in terms of disk resources, does it make more sense to have 2 data volumes instead of one? We have 4 data disks in raid 0, would this make more sense to be 2 x 2 disks in raid 0? That way the reader and writer I assume would always be a different set of spindles

Re: Node OOM Problems

2010-08-21 Thread Benjamin Black
How much storage do you need? 240G SSDs quite capable of saturating a 3Gbps SATA link are $600. Larger ones are also available with similar performance. Perhaps you could share a bit more about the storage and performance requirements. How SSDs to sustain 10k writes/sec PER NODE WITH LINEAR SCA

Re: Node OOM Problems

2010-08-21 Thread Wayne
Thank you for the advice, I will try these settings. I am running defaults right now. The disk subsystem is one SATA disk for commitlog and 4 SATA disks in raid 0 for the data. >From your email you are implying this hardware can not handle this level of sustained writes? That kind of breaks down t

Re: Node OOM Problems

2010-08-21 Thread Benjamin Black
My guess is that you have (at least) 2 problems right now: You are writing 10k ops/sec to each node, but have default memtable flush settings. This is resulting in memtable flushing every 30 seconds (default ops flush setting is 300k). You thus have a proliferation of tiny sstables and are seein

Re: Node OOM Problems

2010-08-21 Thread Benjamin Black
Perhaps I missed it in one of the earlier emails, but what is your disk subsystem config? On Sat, Aug 21, 2010 at 2:18 AM, Wayne wrote: > I am already running with those options. I thought maybe that is why they > never get completed as they keep pushed pushed down in priority? I am > getting tim

Re: Node OOM Problems

2010-08-21 Thread Wayne
I am already running with those options. I thought maybe that is why they never get completed as they keep pushed pushed down in priority? I am getting timeouts now and then but for the most part the cluster keeps running. Is it normal/ok for the repair and compaction to take so long? It has been o

Re: Node OOM Problems

2010-08-21 Thread Jonathan Ellis
yes, the AES is the repair. if you are running linux, try adding the options to reduce compaction priority from http://wiki.apache.org/cassandra/PerformanceTuning On Sat, Aug 21, 2010 at 3:17 AM, Wayne wrote: > I could tell from munin that the disk utilization was getting crazy high, > but the s

Re: Node OOM Problems

2010-08-20 Thread Bill de hÓra
On Fri, 2010-08-20 at 19:17 +0200, Wayne wrote: > WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602 > MessageDeserializationTask.java (line 47) dropping message > (1,078,378ms past timeout) > WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-20 16:57:02,602 > MessageDeserializationTask.java (l

Re: Node OOM Problems

2010-08-20 Thread Jonathan Ellis
these warnings mean you have more requests queued up than you are able to handle. that request queue is what is using up most of your heap memory. On Fri, Aug 20, 2010 at 12:17 PM, Wayne wrote: > I turned off the creation of the secondary indexes which had the large rows > and all seemed good. T

Re: Node OOM Problems

2010-08-20 Thread Wayne
I deleted ALL data and reset the nodes from scratch. There are no more large rows in there. 8-9megs MAX across all nodes. This appears to be a new problem. I restarted the node in question and it seems to be running fine, but I had to run repair on it as it appears to be missing a lot of data. On

Re: Node OOM Problems

2010-08-20 Thread Edward Capriolo
On Fri, Aug 20, 2010 at 1:17 PM, Wayne wrote: > I turned off the creation of the secondary indexes which had the large rows > and all seemed good. Thank you for the help. I was getting > 60k+/writes/second on the 6 node cluster. > > Unfortunately again three hours later a node went down. I can not

Re: Node OOM Problems

2010-08-20 Thread Wayne
I turned off the creation of the secondary indexes which had the large rows and all seemed good. Thank you for the help. I was getting 60k+/writes/second on the 6 node cluster. Unfortunately again three hours later a node went down. I can not even look at the logs when it started since they are go

Re: Node OOM Problems

2010-08-19 Thread Wayne
The NullPointerException does not crash the node. It only makes it flap/go down a for short period and then it comes back up. I do not see anything abnormal in the system log, only that single error in the cassandra.log. On Thu, Aug 19, 2010 at 11:42 PM, Peter Schuller < peter.schul...@infidyne.c

Re: Node OOM Problems

2010-08-19 Thread Peter Schuller
> Sorry; that meant the "set of data acually live (i.e., not garbage) in > the heap". In other words, the amount of memory truly "used". And to clarify further this is not the same as the 'used' reported by GC statistics, except as printed after a CMS concurrent mark/sweep has completed (and even

Re: Node OOM Problems

2010-08-19 Thread Peter Schuller
> What is my "live set"? Sorry; that meant the "set of data acually live (i.e., not garbage) in the heap". In other words, the amount of memory truly "used". > Is the system CPU bound given the few statements > below? This is from running 4 concurrent processes against the node...do I > need to t

Re: Node OOM Problems

2010-08-19 Thread Edward Capriolo
On Thu, Aug 19, 2010 at 4:49 PM, Wayne wrote: > What is my "live set"? Is the system CPU bound given the few statements > below? This is from running 4 concurrent processes against the node...do I > need to throttle back the concurrent read/writers? > > I do all reads/writes as Quorum. (Replicatio

Re: Node OOM Problems

2010-08-19 Thread Wayne
What is my "live set"? Is the system CPU bound given the few statements below? This is from running 4 concurrent processes against the node...do I need to throttle back the concurrent read/writers? I do all reads/writes as Quorum. (Replication factor of 3). The memtable threshold is the default o

Re: Node OOM Problems

2010-08-19 Thread Peter Schuller
> of a rogue large row is one I never considered. The largest row on the other > nodes is as much as 800megs. I can not get a cfstats reading on the bad node WIth 0.6 I can definitely see this being a problem if I understand its behavior correctly (I have not actually used 0.6 even for testing). I

Re: Node OOM Problems

2010-08-19 Thread Peter Schuller
So, these: >  INFO [GC inspection] 2010-08-19 16:34:46,656 GCInspector.java (line 116) GC > for ConcurrentMarkSweep: 41615 ms, 192522712 reclaimed leaving 8326856720 > used; max is 8700035072 [snip] > INFO [GC inspection] 2010-08-19 16:36:00,786 GCInspector.java (line 116) GC > for ConcurrentMark

Re: Node OOM Problems

2010-08-19 Thread Edward Capriolo
On Thu, Aug 19, 2010 at 4:13 PM, Wayne wrote: > We are using the random partitioner. The tokens we defined manually and data > is almost totally equal among nodes, 15GB per node when the trouble started. > System vitals look fine. CPU load is ~500% for java, iostats are low, > everything for all p

Re: Node OOM Problems

2010-08-19 Thread Edward Capriolo
On Thu, Aug 19, 2010 at 2:48 PM, Wayne wrote: > I am having some serious problems keeping a 6 node cluster up and running > and stable under load. Any help would be greatly appreciated. > > Basically it always comes back to OOM errors that never seem to subside. > After 5 minutes or 3 hours of hea