It's a long time since I looked at the code, but I'm pretty sure that
comment is explaining why we translate *no* timestamp to *epoch*, to save
space when serializing the encoding stats. Not stipulating that the data
may be inaccurate.
However, being such a long time since I looked, I forgot we s
Finding the max timestamp of a partition is an aggregation. Doing that
calculation purely on the replica (wether pre-calculated or not) is problematic
for any CL > 1 in the face of deletions or update that are missing. As the
contents of the partition on a given replica are different than what
First of all, thx for all the ideas.
Benedict ElIiott Smith, in code comments I found a notice that data in
EncodingStats can be wrong, not sure that its good idea to use it for accurate
results. As I understand incorrect data is not a problem for the current use
case of it, but not for my one
(Obviously, not to detract from the points that Jon and Jeremiah make, i.e.
that if TTLs or tombstones are involved the metadata we have, or can add,
is going to be worthless in most cases anyway)
On 14 January 2018 at 16:11, Benedict Elliott Smith
wrote:
> We already store the minimum timestamp
We already store the minimum timestamp in the EncodingStats of each
partition, to support more efficient encoding of atom timestamps. This
just isn't exposed beyond UnfilteredRowIterator, though it probably could
be.
Storing the max alongside would still require justification, though its
cost wou
Don’t forget about deleted and missing data. The bane of all on replica
aggregation optimization’s.
> On Jan 14, 2018, at 12:07 AM, Jeff Jirsa wrote:
>
>
> You’re right it’s not stored in metadata now. Adding this to metadata isn’t
> hard, it’s just hard to do it right where it’s useful to p