I might be inclined to include a generation ID in the partition keys. Keep
a separate table where you upgrade the generation ID when your processing
is complete. You can even use CAS operations in case you goofed up and
generated two generations at the same time (or your processing time exceeds
y
Hi again Chris,
Another option would be to have a look at using a Merkle Tree to quickly
drill down to the differences. This is actually what Cassandra uses
internally when running a repair between different nodes.
Cheers,
Jens
On Wed, Sep 7, 2016 at 9:47 AM wrote:
> First off I hope this appr
Hi Chris,
Without fully knowing your usecase; You can't keep track of which keys have
changed in the external system somehow? Otherwise 2) sounds like the way to
go to me.
Cheers,
Jens
On Wed, Sep 7, 2016 at 9:47 AM wrote:
> First off I hope this appropriate here- I couldn't decide whether thi
First off I hope this appropriate here- I couldn't decide whether this was a
question for Cassandra users or spark users so if you think it's in the wiring
place feel free to redirect me.
I have a system that does a load of data manipulation using spark. The output
of this program is a effecti