Hi,

On Thursday, 2 August 2012 at 11:47, Owen Davies wrote:

> We want to store a large number of columns in a single row (up to about 
> 100,000,000), where each value is roughly 10 bytes.
>  
> We also need to be able to get slices of columns from any point in the row.
>  
> We haven't found a problem with smaller amounts of data so far, but can 
> anyone think of any reason if this is a bad idea, or would cause large 
> performance problems?

my experience with wide rows & cassandra is not positive. We used to have rows 
of a few hundred megabytes each, to be read during Map Reduce computation, and 
that caused many issues, especially with timeouts reading the rows (with 
cassandra under a medium write load) and OutOfMemory exceptions.

The solution in our case was to "shard" (timebucket) the rows into smaller 
pieces (a few megabytes each).

The situation might have changed with Cassandra 1.1.0, which claims to have 
some "wide row" support, but I haven't been able to test that.

>  
> If breaking up the row is something we should do, what is the maximum number 
> of columns we should have?
>  
> We are not too worried if there is only a small performance decrease, adding 
> more nodes to the cluster would be an option to help make code simpler.

I don't have a precise figure, but I'd limit row size to less than 100MB… much 
less, if possible. In general, my experience is that hundred of millions of 
small rows don't cause issues, but having just a few very wide rows will cause 
timeouts and, in worst cases, OOM.


--  
Filippo Diotalevi

Reply via email to