Re: How to model data to achieve specific data locality

2014-12-09 Thread Kai Wang
ay “The >>>> typical read is to load a subset of sequences with the same seq_id”, >>>> what type of “subset” are you talking about? Again, a few explicit and >>>> concise example queries (in some concise, easy to read pseudo language or >>>> even plain Eng

Re: How to model data to achieve specific data locality

2014-12-08 Thread Eric Stevens
;>> even plain English, but not belabored with full CQL syntax.) would be very >>> helpful. I mean, Cassandra has no “subset” concept, nor a “load subset” >>> command, so what are we really talking about? >>> >>> Also, I presume we are talking CQL, but some of t

Re: How to model data to achieve specific data locality

2014-12-07 Thread Jonathan Haddad
nor a “load subset” >> command, so what are we really talking about? >> >> Also, I presume we are talking CQL, but some of the references seem more >> Thrift/slice oriented. >> >> -- Jack Krupansky >> >> *From:* Eric Stevens >> *Sent:* Sunday, De

Re: How to model data to achieve specific data locality

2014-12-07 Thread Jack Krupansky
: Re: How to model data to achieve specific data locality Thanks for the help. I wasn't clear how clustering column works. Coming from Thrift experience, it took me a while to understand how clustering column impacts partition storage on disk. Now I believe using seq_type as the first clust

Re: How to model data to achieve specific data locality

2014-12-07 Thread Kai Wang
a “load subset” > command, so what are we really talking about? > > Also, I presume we are talking CQL, but some of the references seem more > Thrift/slice oriented. > > -- Jack Krupansky > > *From:* Eric Stevens > *Sent:* Sunday, December 7, 2014 10:12 AM > *To:* us

Re: How to model data to achieve specific data locality

2014-12-07 Thread Jack Krupansky
more Thrift/slice oriented. -- Jack Krupansky From: Eric Stevens Sent: Sunday, December 7, 2014 10:12 AM To: user@cassandra.apache.org Subject: Re: How to model data to achieve specific data locality > Also new seq_types can be added and old seq_types can be deleted. This means > I ofte

Re: How to model data to achieve specific data locality

2014-12-07 Thread Eric Stevens
> Also new seq_types can be added and old seq_types can be deleted. This means I often need to ALTER TABLE to add and drop columns. Kai, unless I'm misunderstanding something, I don't see why you need to alter the table to add a new seq type. From a data model perspective, these are just new valu

Re: How to model data to achieve specific data locality

2014-12-07 Thread DuyHai Doan
"Those sequences are not fixed. All sequences with the same seq_id tend to grow at the same rate. If it's one partition per seq_id, the size will most likely exceed the threshold quickly" --> Then use bucketing to avoid too wide partitions "Also new seq_types can be added and old seq_types can be

Re: How to model data to achieve specific data locality

2014-12-06 Thread Kai Wang
On Sat, Dec 6, 2014 at 11:18 AM, Eric Stevens wrote: > It depends on the size of your data, but if your data is reasonably small, > there should be no trouble including thousands of records on the same > partition key. So a data model using PRIMARY KEY ((seq_id), seq_type) > ought to work fine.

Re: How to model data to achieve specific data locality

2014-12-06 Thread Eric Stevens
It depends on the size of your data, but if your data is reasonably small, there should be no trouble including thousands of records on the same partition key. So a data model using PRIMARY KEY ((seq_id), seq_type) ought to work fine. If the data size per partition exceeds some threshold that rep

How to model data to achieve specific data locality

2014-12-05 Thread Kai Wang
I have a data model question. I am trying to figure out how to model the data to achieve the best data locality for analytic purpose. Our application processes sequences. Each sequence has a unique key in the format of [seq_id]_[seq_type]. For any given seq_id, there are unlimited number of seq_typ