RE: Getting all unique keys

2017-08-23 Thread Durity, Sean R
DataStax Enterprise bundles spark and spark connector on the DSE nodes and handles much of the plumbing work (and monitoring, etc.). Worth a look. Sean Durity From: Avi Levi [mailto:a...@indeni.com] Sent: Tuesday, August 22, 2017 2:46 AM To: user@cassandra.apache.org Subject: Re: Getting all

Re: Getting all unique keys

2017-08-21 Thread Avi Levi
Thanks Christophe, we will definitely consider that in the future. On Mon, Aug 21, 2017 at 3:01 PM, Christophe Schmitz < christo...@instaclustr.com> wrote: > Hi Avi, > > The spark-project documentation is quite good, as well as the > spark-cassandra-connector github project, which contains some b

Re: Getting all unique keys

2017-08-21 Thread Christophe Schmitz
Hi Avi, The spark-project documentation is quite good, as well as the spark-cassandra-connector github project, which contains some basic examples you can easily get inspired from. A few random advice you might find usefull: - You will want one spark worker on each node, and a spark master on eith

Re: Getting all unique keys

2017-08-20 Thread Avi Levi
Thanks Christophe, we didn't want to add too many moving parts but is sound like a good solution. do you have any reference / link that I can look at ? Cheers Avi On Mon, Aug 21, 2017 at 3:43 AM, Christophe Schmitz < christo...@instaclustr.com> wrote: > Hi Avi, > > Have you thought of using Spar

Re: Getting all unique keys

2017-08-20 Thread Christophe Schmitz
Hi Avi, Have you thought of using Spark for that work? If you collocate the spark workers on each Cassandra nodes, the spark-cassandra connector will split automatically the token range for you in such a way that each spark worker only hit the Cassandra local node. This will also be done in parall

Re: Getting all unique keys

2017-08-20 Thread Avi Levi
Thank you very much , one question . you wrote that I do not need distinct here since it's a part from the primary key. but only the combination is unique (*PRIMARY KEY (id, timestamp) ) .* also if I take the last token and feed it back as you showed wouldn't I get overlapping boundaries ? On Sun,

Re: Getting all unique keys

2017-08-20 Thread Eric Stevens
You should be able to fairly efficiently iterate all the partition keys like: select id, token(id) from table where token(id) >= -9204925292781066255 limit 1000; id | system.token(id) +-- ...

Re: Getting all unique keys

2017-08-19 Thread Avi Levi
I need to get all unique keys (not the complete primary key, just the partition key) in order to aggregate all the relevant records of that key and apply some calculations on it. *CREATE TABLE my_table ( id text, timestamp bigint, value double, PRIMARY KEY (id, timestamp) )* I

Re: Getting all unique keys

2017-08-18 Thread kurt greaves
You can SELECT DISTINCT in CQL, however I would recommend against such a pattern as it is very unlikely to be efficient, and prone to errors. A distinct query will search every partition for the first live cell, which could be buried behind a lot of tombstones. It's safe to say at some point you wi

Re: Getting all unique keys

2017-08-18 Thread Sruti S
hi: Is this sensor data, hence timestamp? Ho w are you generating this 'key' field?Can you have only the 'key' field as primary key? Even if not, since that field is a part of the PK may make such queries fast. However, are there other attributes thst can be added that define unique business key