Re: Unexplainably large reported partition sizes

2016-03-07 Thread Nate McCall
> > > Rob, can you remember which bug/jira this was? I have not been able to > find it. > I'm using 2.1.9. > > https://issues.apache.org/jira/browse/CASSANDRA-7953 Rob may have a different one, but I've something similar from this issue. Fixed in 2.1.12. -- - Nate McCall Austin,

Re: Unexplainably large reported partition sizes

2016-03-07 Thread Tom van den Berge
Hi Bryan, > Do you use any collections on this column family? We've had issues in the > past with unexpectedly large partitions reported on data models with > collections, which can also generate tons of tombstones on UPDATE ( > https://issues.apache.org/jira/browse/CASSANDRA-10547) > I've been

Re: Unexplainably large reported partition sizes

2016-03-07 Thread Tom van den Berge
Hi Rob, The reason I didn't dump the table with sstable2json is that I didn't think of it ;) I just used it, and it looks very much like the "avalanche of tombstones" bug you are describing! I took one of the three sstables containing the key, and it resulted in a 4.75 million-line json file, of

Re: Unexplainably large reported partition sizes

2016-03-07 Thread Bryan Cheng
Hi Tom, Do you use any collections on this column family? We've had issues in the past with unexpectedly large partitions reported on data models with collections, which can also generate tons of tombstones on UPDATE ( https://issues.apache.org/jira/browse/CASSANDRA-10547) --Bryan On Mon, Mar 7

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-07 Thread Ben Bromhead
+1 for http://opensourceconnections.com/blog/2013/08/31/building- the-perfect-cassandra-test-environment/ We also run Cassandra on t2.mediums for our Developer clusters. You can force Cassandra to

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-07 Thread Robert Coli
On Fri, Mar 4, 2016 at 8:27 PM, Jack Krupansky wrote: > Please review the minimum hardware requirements as clearly documented: > > http://docs.datastax.com/en/cassandra/3.x/cassandra/planning/planPlanningHardware.html > That is a document for Datastax Cassandra, not Apache Cassandra. It's wonder

Re: Unexplainably large reported partition sizes

2016-03-07 Thread Robert Coli
On Sat, Mar 5, 2016 at 9:16 AM, Tom van den Berge wrote: > I don't think compression can be the cause of the difference, because of > two reasons: > Your two reasons seem legitimate. Though you say you do not frequently do DELETE and so it shouldn't be due to tombstones, there are semi-recent v

Re: moving keyspaces to another disk while Cassandra is running

2016-03-07 Thread Robert Coli
On Mon, Mar 7, 2016 at 2:57 AM, Krzysztof Księżyk wrote: > I see on lsof output that even if keyspace > is not queried, Cassandra keeps files opened, so I guess it's not safe to > hotswap, but I'd like to make sure. > It is not safe for exactly this reason. Just restart your nodes. Were I doing

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Bhuvan Rawal
Thanks for the correction Jon. (Atmost 2000 queries *per cluster* for serving 100 searches.) On Mon, Mar 7, 2016 at 11:47 PM, Jonathan Haddad wrote: > If you're doing 100 searches a second each machine will be serving at most > 100 requests per second, not 2000. > > On Mon, Mar 7, 2016 at 10:13

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Jonathan Haddad
If you're doing 100 searches a second each machine will be serving at most 100 requests per second, not 2000. On Mon, Mar 7, 2016 at 10:13 AM Bhuvan Rawal wrote: > Well thats certainly true, there are these points worth discussing here : > > 1. Scatter Gather queries - Especially if the cluster

Re: How to create an additional cluster in Cassandra exclusively for Analytics Purpose

2016-03-07 Thread Bhuvan Rawal
Well thats certainly true, there are these points worth discussing here : 1. Scatter Gather queries - Especially if the cluster size is large. Say we have a 20 node cluster, and we are searching 100 times a second. then effectively coordinator would be hitting each node 2000 times (20*100) That fa

Query regarding filter and where in spark on cassandra

2016-03-07 Thread Siddharth Verma
Hi, While working with spark running on top of cassandra, I wanted to do some filtering on data. It can be done either on server side(where clause while cassandraTable query is written) or on client side(filter transformation on rdd). Which one of them is preferred keeping performance and time in m

Re: moving keyspaces to another disk while Cassandra is running

2016-03-07 Thread Jack Krupansky
If your data is replicated properly (RF=3) and you do QUORUM reads and writes, you should be able to shut down one node, adjust the configuration, and restart that node and all should be fine. Do it quickly enough (less than an hour) and the node should quickly catch up with any changes. How small

moving keyspaces to another disk while Cassandra is running

2016-03-07 Thread Krzysztof Księżyk
Hi, I have small Cassandra cluster running on boxes with 256GB SSD and 2TB HDD. Originally SSD was for system and commit log and HDD for data. But unfortunately because of nature of queries, performance was not satisfactory and to improve it, data were moved to SSD as well. Now problem is with