Hello,
I apologize for my very vague email, I shouldn't have written it in such
a hurry. I would like to clarify my use case and requirements, so that
maybe someone can give me some advice.
I am building a research version of Cassandra in which a missed write is
a normal case (e.g. out of n replicas, it would be a normal case for at
least one of these to miss a write). I keep track of missed writes
similar to how default Cassandra does for HintedHandoff (a column family
in system that stores serialized RowMutations). Later, when the nodes
that were missed are ready to receive writes again, the node caching the
RowMutations sends them one a a time until they have all been delivered.
This all happens in the context of a live, serving system.
My system works and does what it is supposed to, now I am trying to
improve performance. I currently have two optimizations in mind, but am
not sure how to approach them:
1) Minimize the transfer of excessive RowMutations by merging all
RowMutations for the same key, and transmitting only one per key. In the
event that a subset of keys are very popular, I can minimize how much I
need to transfer to bring a node back up to date. I am thinking I can go
inside the RowMutation and merge each ColumnFamily, then create a new
RowMutation with the merged CFs. Is ColumnFamily.diff() the right way to
merge an invididual CF, or am I misunderstanding it?
2) Serialize a whole bunch of RowMutations into a chunk, stream the
chunk to the appropriate node, deserialize them, and apply them
individually. In this case, I would avoid having to wait for an ACK on
each mutation, and could more efficiently send lots of data. Is this
feasible with the existing streaming infrastructure, or would I have to
implement a new facility?
Again, my codebase is on top of Cassandra 1.1.6. I would very much
appreciate any insight anyone could give me.
Thanks very much,
Bill Katsak
On 04/08/2013 12:10 PM, William Katsak wrote:
Hello,
I am sorry to bother the list with this question, but I was wondering,
assuming I have many saved (small) mutations (of the type that hinted
handoff uses), is there any easy way to put these all together and
bulk transmit (stream) them to a destination node?
My codebase is based on Cassandra 1.1.6.
Thanks very much in advance,
Bill Katsak