Hello,

I apologize for my very vague email, I shouldn't have written it in such a hurry. I would like to clarify my use case and requirements, so that maybe someone can give me some advice.

I am building a research version of Cassandra in which a missed write is a normal case (e.g. out of n replicas, it would be a normal case for at least one of these to miss a write). I keep track of missed writes similar to how default Cassandra does for HintedHandoff (a column family in system that stores serialized RowMutations). Later, when the nodes that were missed are ready to receive writes again, the node caching the RowMutations sends them one a a time until they have all been delivered. This all happens in the context of a live, serving system.

My system works and does what it is supposed to, now I am trying to improve performance. I currently have two optimizations in mind, but am not sure how to approach them:

1) Minimize the transfer of excessive RowMutations by merging all RowMutations for the same key, and transmitting only one per key. In the event that a subset of keys are very popular, I can minimize how much I need to transfer to bring a node back up to date. I am thinking I can go inside the RowMutation and merge each ColumnFamily, then create a new RowMutation with the merged CFs. Is ColumnFamily.diff() the right way to merge an invididual CF, or am I misunderstanding it?

2) Serialize a whole bunch of RowMutations into a chunk, stream the chunk to the appropriate node, deserialize them, and apply them individually. In this case, I would avoid having to wait for an ACK on each mutation, and could more efficiently send lots of data. Is this feasible with the existing streaming infrastructure, or would I have to implement a new facility?

Again, my codebase is on top of Cassandra 1.1.6. I would very much appreciate any insight anyone could give me.

Thanks very much,
Bill Katsak

On 04/08/2013 12:10 PM, William Katsak wrote:
Hello,

I am sorry to bother the list with this question, but I was wondering, assuming I have many saved (small) mutations (of the type that hinted handoff uses), is there any easy way to put these all together and bulk transmit (stream) them to a destination node?

My codebase is based on Cassandra 1.1.6.

Thanks very much in advance,
Bill Katsak




Reply via email to