Hi,
I am trying to understand the algorithm of STCS. As per my current
understanding of the code, there seems to be no impact of setting bucket_low in
the STCS compaction algorithm. Moreover, I see some optimization. I would
appreciate if some designer can correct me or confirm that it's a bug sonthat I
can raise a JIRA.
Details
--------------
getBuckets() method of SizeTieredCompactionStrategy sorts sstables by size in
ascending order and then iterates over them one by one to associate them to an
existing/new bucket. When, iterating sstables in ascending order of size, I
can't find ANY single scenario where the current sstable in the outer loop
iteration is below the oldAverageSize of any existing bucket. Current sstable
being iterated will ALWAYS be greater than/equal to the oldAverageSize of ALL
existing buckets as ALL previous sstables in existing buckets were
smaller/equal in size to the sstable being iterated.
So, there is NO scenario when size > (oldAverageSize * bucketLow) and size <
oldAverageSize, so bucket_low property never comes into play no matter what
value you set for it.
Also, while iteraitng over sstables (sortedfiles) by size in ascending order,
there is no point iterating over all existing buckets. We could just start from
the LAST bucket where previous sstable was associated. oldAverageSize of ALL
other buckets will NEVER allow the sstable being iterated.
for (Entry<Long, List<T>> entry : buckets.entrySet())
{...}
Thanks
Anuj