Hello,
I am trying to concretely understand how DTCS makes buckets and I am looking at
the DateTieredCompactionStrategyTest.testGetBuckets method and played with some
of the parameters to GetBuckets method call (Cassandra 2.1.12).
I don't think I fully understand something there. Let me try to explain.
Consider the second test there. I changed the pairs a bit for easier
explanation and changed base (initial window size)=1000L and Min_Threshold=2
pairs = Lists.newArrayList(
Pair.create("a", 200L),
Pair.create("b", 2000L),
Pair.create("c", 3600L),
Pair.create("d", 3899L),
Pair.create("e", 3900L),
Pair.create("f", 3950L),
Pair.create("too new", 4125L)
);
buckets = getBuckets(pairs, 1000L, 2, 4050L, Long.MAX_VALUE);
In this case, the buckets should look like [0-4000] [4000-]. Is this correct ?
The buckets that I get back are different ("a" lives in its bucket and everyone
else in another). What I am missing here ?
Another case,
pairs = Lists.newArrayList(
Pair.create("a", 200L),
Pair.create("b", 2000L),
Pair.create("c", 3600L),
Pair.create("d", 3899L),
Pair.create("e", 3900L),
Pair.create("f", 3950L),
Pair.create("too new", 4125L)
);
buckets = getBuckets(pairs, 50L, 4, 4050L, Long.MAX_VALUE);
Here, the buckets should be [0-3200] [3200-4000] [4000-4050] [4050-]. Is this
correct ? Again, the buckets that come back are quite different.
Note, that if I keep the base to original (100L) or increase it and play with
min_threshold the results are exactly what I would expect.
The way I think about DTCS is, try to make buckets of maximum possible sizes
from 0, and once you can't make do that , make smaller buckets (similar to what
the comment suggests). Is this mental model wrong ? I am afraid that the math
in Target class is somewhat hard to follow so I am thinking about it this way.
Thanks a lot in advance.
-Anubhav