Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-04 Thread Jonathan Haddad
In the future you may find SASI indexes useful for indexing Cassandra data. Shameless blog post plug: http://rustyrazorblade.com/2016/02/cassandra-secondary-index-preview-1/ Deep technical dive: http://www.doanduyhai.com/blog/?p=2058 On Thu, Aug 4, 2016 at 11:45 AM Kevin Burton wrote: > BTW. we

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-04 Thread Kevin Burton
BTW. we think we tracked this down to using large partitions to implement inverted indexes. C* just doesn't do a reasonable job at all with large partitions so we're going to migrate this use case to using Elasticsearch On Wed, Aug 3, 2016 at 1:54 PM, Ben Slater wrote: > Yep, that was what I w

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Ben Slater
Yep, that was what I was referring to. On Thu, 4 Aug 2016 2:24 am Reynald Bourtembourg < reynald.bourtembo...@esrf.fr> wrote: > Hi, > > Maybe Ben was referring to this issue which has been mentioned recently on > this mailing list: > https://issues.apache.org/jira/browse/CASSANDRA-11887 > > Chee

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Henrik Schröder
Have you tried using the G1 garbage collector instead of CMS? We had the same issues that things were normally fine, but as soon as something extraordinary happened, a node could go into GC hell and never recover, and that could then spread to other nodes as they took up the slack, trapping them i

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
We usually use 100 per every 5 minutes.. but you're right. We might actually move this use case over to using Elasticsearch in the next couple of weeks. On Wed, Aug 3, 2016 at 11:09 AM, Jonathan Haddad wrote: > Kevin, > > "Our scheme uses large buckets of content where we write to a > bucket/pa

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Jonathan Haddad
Kevin, "Our scheme uses large buckets of content where we write to a bucket/partition for 5 minutes, then move to a new one." Are you writing to a single partition and only that partition for 5 minutes? If so, you should really rethink your data model. This method does not scale as you add node

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Reynald Bourtembourg
Hi, Maybe Ben was referring to this issue which has been mentioned recently on this mailing list: https://issues.apache.org/jira/browse/CASSANDRA-11887 Cheers, Reynald On 03/08/2016 18:09, Romain Hardouin wrote: >Curious why the 2.2 to 3.x upgrade path is risky at best. I guess that upgrade

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Romain Hardouin
> Curious why the 2.2 to 3.x upgrade path is risky at best. I guess that >upgrade from 2.2 is less tested by DataStax QA because DSE4 used C* 2.1, not >2.2.I would say the safest upgrade is 2.1 to 3.0.x Best, Romain

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
DuyHai. Yes. We're generally happy with our disk throughput. We're on all SSD and have about 60 boxes. The amount of data written isn't THAT much. Maybe 5GB max... but its over 60 boxes. On Wed, Aug 3, 2016 at 3:49 AM, DuyHai Doan wrote: > On a side node, do you monitor your disk I/O to s

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
Curious why the 2.2 to 3.x upgrade path is risky at best. Do you mean that this is just for OUR use case since we're having some issues or that the upgrade path is risky in general? On Wed, Aug 3, 2016 at 3:41 AM, Ben Slater wrote: > Yes, looks like you have a (at least one) 100MB partition whic

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread DuyHai Doan
On a side node, do you monitor your disk I/O to see whether the disk bandwidth can catch up with the huge spikes in write ? Use dstat during the insert storm to see if you have big values for CPU wait On Wed, Aug 3, 2016 at 12:41 PM, Ben Slater wrote: > Yes, looks like you have a (at least one)

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Ben Slater
Yes, looks like you have a (at least one) 100MB partition which is big enough to cause issues. When you do lots of writes to the large partition it is likely to end up getting compacted (as per the log) and compactions often use a lot of memory / cause a lot of GC when they hit large partitions. Th

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-02 Thread Kevin Burton
I have a theory as to what I think is happening here. There is a correlation between the massive content all at once, and our outags. Our scheme uses large buckets of content where we write to a bucket/partition for 5 minutes, then move to a new one. This way we can page through buckets. I thin

Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-02 Thread Kevin Burton
We have a 60 node CS cluster running 2.2.7 and about 20GB of RAM allocated to each C* node. We're aware of the recommended 8GB limit to keep GCs low but our memory has been creeping up (probably) related to this bug. Here's what we're seeing... if we do a low level of writes we think everything g