On Tue, Jan 5, 2016 at 5:52 PM, Jonathan Haddad wrote:
> You could keep a "num_buckets" value associated with the client's account,
> which can be adjusted accordingly as usage increases.
>
Yes, but the adjustment problem is tricky when there are multiple
concurrent writers. What happens when yo
You could keep a "num_buckets" value associated with the client's account,
which can be adjusted accordingly as usage increases.
On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona wrote:
> On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
> clintlmar...@coolfiretechnologies.com> wrote:
>
>> What sort of dat
On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
clintlmar...@coolfiretechnologies.com> wrote:
> What sort of data is your clustering key composed of? That might help some
> in determining a way to achieve what you're looking for.
>
Just a UUID that acts as an object identifier.
>
> Clint
> On Jan
What sort of data is your clustering key composed of? That might help some
in determining a way to achieve what you're looking for.
Clint
On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote:
> Hi Nate,
>
> Yes, I've been thinking about treating customers as either small or big,
> where "small" ones have
Hi Nate,
Yes, I've been thinking about treating customers as either small or big,
where "small" ones have a single partition and big ones have 50 (or
whatever number I need to keep sizes reasonable). There's still the problem
of how to handle a small customer who becomes too big, but that will hap
Hi Jack,
Thanks for your response. My answers inline...
On Tue, Jan 5, 2016 at 11:52 AM, Jack Krupansky
wrote:
> Jim, I don't quite get why you think you would need to query 50 partitions
> to return merely hundreds or thousands of rows. Please elaborate. I mean,
> sure, for that extreme 100th
>
>
> In this case, 99% of my data could fit in a single 50 MB partition. But if
> I use the standard approach, I have to split my partitions into 50 pieces
> to accommodate the largest data. That means that to query the 700 rows for
> my median case, I have to read 50 partitions instead of one.
>
Jim, I don't quite get why you think you would need to query 50 partitions
to return merely hundreds or thousands of rows. Please elaborate. I mean,
sure, for that extreme 100th percentile, yes, you would query a lot of
partitions, but for the 90th percentile it would be just one. Even the 99th
per
Thanks for responding!
My natural partition key is a customer id. Our customers have widely
varying amounts of data. Since the vast majority of them have data that's
small enough to fit in a single partition, I'd like to avoid imposing
unnecessary overhead on the 99% just to avoid issues with the
You should endeavor to use a repeatable method of segmenting your data.
Swapping partitions every time you "fill one" seems like an anti pattern to
me. but I suppose it really depends on what your primary key is. Can you
share some more information on this?
In the past I have utilized the consiste
A problem that I have run into repeatedly when doing schema design is how
to control partition size while still allowing for efficient multi-row
queries.
We want to limit partition size to some number between 10 and 100 megabytes
to avoid operational issues. The standard way to do that is to figur
11 matches
Mail list logo