Re: Finding records that exist on Cassandra but not externally

2016-09-11 Thread Eric Stevens
I might be inclined to include a generation ID in the partition keys. Keep a separate table where you upgrade the generation ID when your processing is complete. You can even use CAS operations in case you goofed up and generated two generations at the same time (or your processing time exceeds y

Re: Finding records that exist on Cassandra but not externally

2016-09-08 Thread Jens Rantil
Hi again Chris, Another option would be to have a look at using a Merkle Tree to quickly drill down to the differences. This is actually what Cassandra uses internally when running a repair between different nodes. Cheers, Jens On Wed, Sep 7, 2016 at 9:47 AM wrote: > First off I hope this appr

Re: Finding records that exist on Cassandra but not externally

2016-09-07 Thread Jens Rantil
Hi Chris, Without fully knowing your usecase; You can't keep track of which keys have changed in the external system somehow? Otherwise 2) sounds like the way to go to me. Cheers, Jens On Wed, Sep 7, 2016 at 9:47 AM wrote: > First off I hope this appropriate here- I couldn't decide whether thi

Finding records that exist on Cassandra but not externally

2016-09-07 Thread chris
First off I hope this appropriate here- I couldn't decide whether this was a question for Cassandra users or spark users so if you think it's in the wiring place feel free to redirect me. I have a system that does a load of data manipulation using spark. The output of this program is a effecti