DataStax Enterprise bundles spark and spark connector on the DSE nodes and
handles much of the plumbing work (and monitoring, etc.). Worth a look.
Sean Durity
From: Avi Levi [mailto:a...@indeni.com]
Sent: Tuesday, August 22, 2017 2:46 AM
To: user@cassandra.apache.org
Subject: Re: Getting all
Thanks Christophe, we will definitely consider that in the future.
On Mon, Aug 21, 2017 at 3:01 PM, Christophe Schmitz <
christo...@instaclustr.com> wrote:
> Hi Avi,
>
> The spark-project documentation is quite good, as well as the
> spark-cassandra-connector github project, which contains some b
Hi Avi,
The spark-project documentation is quite good, as well as the
spark-cassandra-connector github project, which contains some basic
examples you can easily get inspired from. A few random advice you might
find usefull:
- You will want one spark worker on each node, and a spark master on eith
Thanks Christophe,
we didn't want to add too many moving parts but is sound like a good
solution. do you have any reference / link that I can look at ?
Cheers
Avi
On Mon, Aug 21, 2017 at 3:43 AM, Christophe Schmitz <
christo...@instaclustr.com> wrote:
> Hi Avi,
>
> Have you thought of using Spar
Hi Avi,
Have you thought of using Spark for that work? If you collocate the spark
workers on each Cassandra nodes, the spark-cassandra connector will split
automatically the token range for you in such a way that each spark worker
only hit the Cassandra local node. This will also be done in parall
Thank you very much , one question . you wrote that I do not need distinct
here since it's a part from the primary key. but only the combination is
unique (*PRIMARY KEY (id, timestamp) ) .* also if I take the last token and
feed it back as you showed wouldn't I get overlapping boundaries ?
On Sun,
You should be able to fairly efficiently iterate all the partition keys
like:
select id, token(id) from table where token(id) >= -9204925292781066255
limit 1000;
id | system.token(id)
+--
...
I need to get all unique keys (not the complete primary key, just the
partition key) in order to aggregate all the relevant records of that key
and apply some calculations on it.
*CREATE TABLE my_table (
id text,
timestamp bigint,
value double,
PRIMARY KEY (id, timestamp) )*
I
You can SELECT DISTINCT in CQL, however I would recommend against such a
pattern as it is very unlikely to be efficient, and prone to errors. A
distinct query will search every partition for the first live cell, which
could be buried behind a lot of tombstones. It's safe to say at some point
you wi
hi:
Is this sensor data, hence timestamp? Ho w are you generating this 'key'
field?Can you have only the 'key' field as primary key? Even if not, since
that field is a part of the PK may make such queries fast.
However, are there other attributes thst can be added that define unique
business key
10 matches
Mail list logo