query by column size

2015-02-12 Thread chandra Varahala
Greetings, I have one column family with 10 columns, one of the column we store xml/json. Is there a way I can query that column where size > 50kb ? assuming I have index on that column. thanks CV.

Re: Safely delete tmplink files - 2.1.2

2015-02-12 Thread Demian Berjman
Thanks Rob! Im going to test with 2.0.12. Cheers, On Wed, Feb 11, 2015 at 6:33 PM, Robert Coli wrote: > On Wed, Feb 11, 2015 at 7:52 AM, Demian Berjman > wrote: > >> Hi, we are expecting the 2.1.3 release to fix the delete of tmplink >> files. In the meantime, is it safe to delete this file

Re: High GC activity on node with 4TB on data

2015-02-12 Thread Jiri Horky
Number of cores: 2x6Cores x 2(HT). I do agree with you that the the hardware is certainly overestimated for just one Cassandra, but we got a very good price since we ordered several 10s of the same nodes for a different project. That's why we use for multiple cassandra instances. Jirka H. On 02/

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Bulat Shakirzyanov
Fixed my Mail.app settings so you can see my actual name, sorry. > On Feb 12, 2015, at 8:55 AM, DataStax wrote: > > Hello, > > As was mentioned earlier, the Java driver doesn’t actually perform pagination. > > Instead, it uses cassandra native protocol to set page size of the result > set. >

Re: Added new nodes to cluster but no streams

2015-02-12 Thread Robert Coli
On Thu, Feb 12, 2015 at 3:20 AM, Batranut Bogdan wrote: > I have added new nodes to the existing cluster. In Opscenter I do not see > any streams... I presume that the new nodes get the data from the rest of > the cluster via streams. The existing cluster has TB magnitude, and space > used in the

Re: Recommissioned a node

2015-02-12 Thread Robert Coli
On Thu, Feb 12, 2015 at 7:04 AM, Eric Stevens wrote: > IMO, especially with the threat to unrecoverable consistency violations, > this should be a critical bug. > You should file a JIRA, and let the list know what it is? :D I was never sure if it was just me being unreasonably literal to presum

Re: Pagination support on Java Driver Query API

2015-02-12 Thread DataStax
Hello, As was mentioned earlier, the Java driver doesn’t actually perform pagination. Instead, it uses cassandra native protocol to set page size of the result set. (https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v2.spec#L699-L730

Re: Recommissioned a node

2015-02-12 Thread Stefano Ortolani
Definitely, I think the very same re this issue. On Thu, Feb 12, 2015 at 7:04 AM, Eric Stevens wrote: > I definitely find it surprising that a node which was decommissioned is > willing to rejoin a cluster. I can't think of any legitimate scenario > where you'd want that, and I'm surprised the

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Eric Stevens
I don't know what the shape of the page state data is deep inside the JavaDriver, I've actually tried to dig into that in the past and understand it to see if I could reproduce it as a general purpose any-query kind of thing. I gave up before I fully understood it, but I think it's actually a hand

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Eric Stevens
Thanks Ondřej! Definitely much easier. N/B, this is a new feature in 2.0.x, it will not work in 1.2.x. cqlsh:scratch> SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) > (1,2) limit 2; Bad Request: line 1:45 no viable alternative at input '(' On Thu, Feb 12, 2015 at 8:44 AM, Ondřej Nešpo

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Ondřej Nešpor
There is a much easier way to do that (and I suppose the Java driver does it this way): page 1: SELECT * FROM foo WHERE partitionkey = 1 limit 2; partitionkey | ck1 | ck2 | col1 | col2 --+-+-+--+-- 1 | 1 | 3 |3 |3 1 | 1 | 2

Re: High GC activity on node with 4TB on data

2015-02-12 Thread Eric Stevens
> each node has 256G of memory, 24x1T drives, 2x Xeon CPU I don't have first hand experience running Cassandra on such massive hardware, but it strikes me that these machines are dramatically oversized to be good candidates for Cassandra (though I wonder how many cores are in those CPUs; I'm guess

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Ajay
Thanks Eric. I figured out the same but didn't get time to put it on the mail. Thanks. But it is highly tied up to how data is stored internally in Cassandra. Basically how partition keys are used to distribute (less likely to change. We are not directly dependence on the partition algo) and clust

Re: Recommissioned a node

2015-02-12 Thread Eric Stevens
I definitely find it surprising that a node which was decommissioned is willing to rejoin a cluster. I can't think of any legitimate scenario where you'd want that, and I'm surprised the node doesn't track that it was decommissioned and refuse to rejoin without at least a -D flag to force it. Way

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Eric Stevens
Your page state then needs to track the last ck1 and last ck2 you saw. Pages 2+ will end up needing to be up to two queries if the first query doesn't fill the page size. CREATE TABLE foo ( partitionkey int, ck1 int, ck2 int, col1 int, col2 int, PRIMARY KEY ((partitionkey), ck1, ck2) )

Re: best supported spark connector for Cassandra

2015-02-12 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Thanks for the hint Gaspar. Do you know if Stratio Deep / Stratio Cassandra are also licensed Apache 2.0? I had interest in knowing more about Stratio when I was working on a start up. Now, on a blueship, it seems one of the hardest obstacles to use Cassandra in a project is the need of an area

Re: How to speed up SELECT * query in Cassandra

2015-02-12 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Thanks Jirka! From: user@cassandra.apache.org Subject: Re: How to speed up SELECT * query in Cassandra Hi, here are some snippets of code in scala which should get you started. Jirka H. loop { lastRow => val query = lastRow matc

Re: best supported spark connector for Cassandra

2015-02-12 Thread Gaspar Muñoz
My suggestion is to use Java or Scala instead of Python. For Java/Scala both the Datastax and Stratio drivers are valid and similar options. As far as I know they both take care about data locality and are not based on the Hadoop interface. The advantage of Stratio Deep is that allows you to integr

Added new nodes to cluster but no streams

2015-02-12 Thread Batranut Bogdan
Hello, I have added new nodes to the existing cluster. In Opscenter I do not see any streams... I presume that the new nodes get the data from the rest of the cluster via streams. The existing cluster has TB magnitude, and space used in the new nodes is ~90 GB. I must admit that I have restarted

Re: Two problems with Cassandra

2015-02-12 Thread Pavel Velikhov
> On Feb 12, 2015, at 12:37 AM, Robert Coli wrote: > > On Wed, Feb 11, 2015 at 2:22 AM, Pavel Velikhov > wrote: > 2. While trying to update the full dataset with a simple transformation > (again via python driver), single node and clustered Cassandra run out