Re: Data Modeling: Partition Size and Query Efficiency

2016-01-06 Thread Jim Ancona
On Tue, Jan 5, 2016 at 5:52 PM, Jonathan Haddad wrote: > You could keep a "num_buckets" value associated with the client's account, > which can be adjusted accordingly as usage increases. > Yes, but the adjustment problem is tricky when there are multiple concurrent writers. What happens when yo

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jonathan Haddad
You could keep a "num_buckets" value associated with the client's account, which can be adjusted accordingly as usage increases. On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona wrote: > On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < > clintlmar...@coolfiretechnologies.com> wrote: > >> What sort of dat

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < clintlmar...@coolfiretechnologies.com> wrote: > What sort of data is your clustering key composed of? That might help some > in determining a way to achieve what you're looking for. > Just a UUID that acts as an object identifier. > > Clint > On Jan

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Clint Martin
What sort of data is your clustering key composed of? That might help some in determining a way to achieve what you're looking for. Clint On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote: > Hi Nate, > > Yes, I've been thinking about treating customers as either small or big, > where "small" ones have

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Hi Nate, Yes, I've been thinking about treating customers as either small or big, where "small" ones have a single partition and big ones have 50 (or whatever number I need to keep sizes reasonable). There's still the problem of how to handle a small customer who becomes too big, but that will hap

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Hi Jack, Thanks for your response. My answers inline... On Tue, Jan 5, 2016 at 11:52 AM, Jack Krupansky wrote: > Jim, I don't quite get why you think you would need to query 50 partitions > to return merely hundreds or thousands of rows. Please elaborate. I mean, > sure, for that extreme 100th

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Nate McCall
> > > In this case, 99% of my data could fit in a single 50 MB partition. But if > I use the standard approach, I have to split my partitions into 50 pieces > to accommodate the largest data. That means that to query the 700 rows for > my median case, I have to read 50 partitions instead of one. >

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jack Krupansky
Jim, I don't quite get why you think you would need to query 50 partitions to return merely hundreds or thousands of rows. Please elaborate. I mean, sure, for that extreme 100th percentile, yes, you would query a lot of partitions, but for the 90th percentile it would be just one. Even the 99th per

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Thanks for responding! My natural partition key is a customer id. Our customers have widely varying amounts of data. Since the vast majority of them have data that's small enough to fit in a single partition, I'd like to avoid imposing unnecessary overhead on the 99% just to avoid issues with the

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-04 Thread Clint Martin
You should endeavor to use a repeatable method of segmenting your data. Swapping partitions every time you "fill one" seems like an anti pattern to me. but I suppose it really depends on what your primary key is. Can you share some more information on this? In the past I have utilized the consiste

Data Modeling: Partition Size and Query Efficiency

2016-01-04 Thread Jim Ancona
A problem that I have run into repeatedly when doing schema design is how to control partition size while still allowing for efficient multi-row queries. We want to limit partition size to some number between 10 and 100 megabytes to avoid operational issues. The standard way to do that is to figur