cassandra table full scan utility
I undertook a similar effort a while ago.
https://issues.apache.org/jira/browse/CASSANDRA-7014
Other than the fact that it was closed with no comments, I can tell you that
other efforts I had to embed things in Cassandra did not go swimmingly.
Although at the time
I undertook a similar effort a while ago.
https://issues.apache.org/jira/browse/CASSANDRA-7014
Other than the fact that it was closed with no comments, I can tell you
that other efforts I had to embed things in Cassandra did not go
swimmingly. Although at the time ideas were rejected like groovy
Hi Jonathan,
If full scan is a regular requirement then setting up a spark cluster in
locality with Cassandra nodes makes perfect sense. But supposing that it is
a one off requirement, say a weekly or a fortnightly task, a spark cluster
could be an added overhead with additional capacity, resource
Hi Jon,
It wan't allowed.
Moreover, if someone who isn't familiar with spark, and might be new to map
filter reduce etc. operations, could also use the utility for some simple
operations assuming a sequential scan of the cassandra table.
Regards
Siddharth Verma
On Tue, Oct 4, 2016 at 1:32 AM, Jon
Couldn't set up as couldn't get it working, or its not allowed?
On Mon, Oct 3, 2016 at 3:23 PM Siddharth Verma
wrote:
> Hi Jon,
> We couldn't setup a spark cluster.
>
> For some use case, a spark cluster was required, but for some reason we
> couldn't create spark cluster. Hence, one may use this
Hi Jon,
We couldn't setup a spark cluster.
For some use case, a spark cluster was required, but for some reason we
couldn't create spark cluster. Hence, one may use this utility to iterate
through the entire table at very high speed.
Had to find a work around, that would be faster than paging on r
It almost sounds like you're duplicating all the work of both spark and the
connector. May I ask why you decided to not use the existing tools?
On Mon, Oct 3, 2016 at 2:21 PM siddharth verma
wrote:
> Hi DuyHai,
> Thanks for your reply.
> A few more features planned in the next one(if there is on
It will be interesting to have a comparison with spark here for basic use
cases.
>From a naive observation it appears that this could be slower than spark as
a lot of data is streamed over network.
On the other hand in this approach we have seen that Young GC takes nearly
full CPU (possibly becau
Hi DuyHai,
Thanks for your reply.
A few more features planned in the next one(if there is one) like,
custom policy keeping in mind the replication of token range on specific
nodes,
fine graining the token range(for more speedup),
and a few more.
I think, as fine graining a token range,
If one toke
Hello Siddarth
I just throw an eye over the architecture diagram. The idea of using
multiple threads, one for each token range is great. It help maxing out
parallelism.
With https://issues.apache.org/jira/browse/CASSANDRA-11521 it would be even
faster.
On Mon, Oct 3, 2016 at 7:51 PM, siddharth v
Hi,
I was working on a utility which can be used for cassandra full table scan,
at a tremendously high velocity, cassandra fast full table scan.
How fast?
The script dumped ~ 229 million rows in 116 seconds, with a cluster of size
6 nodes.
Data transfer rates were upto 25MBps was observed on cassan
11 matches
Mail list logo