Finding the max timestamp of a partition is an aggregation. Doing that calculation purely on the replica (wether pre-calculated or not) is problematic for any CL > 1 in the face of deletions or update that are missing. As the contents of the partition on a given replica are different than what they would be when merged on the coordinator.
> On Jan 14, 2018, at 3:33 PM, "arhel...@gmail.com"<arhel...@gmail.com> wrote: > > First of all, thx for all the ideas. > > Benedict ElIiott Smith, in code comments I found a notice that data in > EncodingStats can be wrong, not sure that its good idea to use it for > accurate results. As I understand incorrect data is not a problem for the > current use case of it, but not for my one. Currently, I added fields for > every AtomicBTreePartition. That fields I update in addAllWithSizeDelta call, > but also now I get that I should think about the case of data removing. > > I currently don't really care about TTL's, but its the case about I should > think, thx. > > Jeremiah Jordan, thx for notice, but I don't really get what are you mean > about replica aggregation optimization’s. Can you please explain it for me? > >> On 2018-01-14 17:16, Benedict Elliott Smith <bened...@apache.org> wrote: >> (Obviously, not to detract from the points that Jon and Jeremiah make, i.e. >> that if TTLs or tombstones are involved the metadata we have, or can add, >> is going to be worthless in most cases anyway) >> >> On 14 January 2018 at 16:11, Benedict Elliott Smith <bened...@apache.org> >> wrote: >> >>> We already store the minimum timestamp in the EncodingStats of each >>> partition, to support more efficient encoding of atom timestamps. This >>> just isn't exposed beyond UnfilteredRowIterator, though it probably could >>> be. >>> >>> Storing the max alongside would still require justification, though its >>> cost would actually be fairly nominal (probably only a few bytes; it >>> depends on how far apart min/max are). >>> >>> I'm not sure (IMO) that even a fairly nominal cost could be justified >>> unless there were widespread benefit though, which I'm not sure this would >>> provide. Maintaining a patched variant of your own that stores this >>> probably wouldn't be too hard, though. >>> >>> In the meantime, exposing and utilising the minimum timestamp from >>> EncodingStats is probably a good place to start to explore the viability of >>> the approach. >>> >>> On 14 January 2018 at 15:34, Jeremiah Jordan <jerem...@datastax.com> >>> wrote: >>> >>>> Don’t forget about deleted and missing data. The bane of all on replica >>>> aggregation optimization’s. >>>> >>>>> On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jji...@gmail.com> wrote: >>>>> >>>>> >>>>> You’re right it’s not stored in metadata now. Adding this to metadata >>>> isn’t hard, it’s just hard to do it right where it’s useful to people with >>>> other data models (besides yours) so it can make it upstream (if that’s >>>> your goal). In particular the worst possible case is a table with no >>>> clustering key and a single non-partition key column. In that case storing >>>> these extra two long time stamps may be 2-3x more storage than without, >>>> which would be a huge regression, so you’d have to have a way to turn that >>>> feature off. >>>>> >>>>> >>>>> Worth mentioning that there are ways to do this without altering >>>> Cassandra - consider using static columns that represent the min timestamp >>>> and max timestamp. Create them both as ints or longs and write them on all >>>> inserts/updates (as part of a batch, if needed). The only thing you’ll have >>>> to do is find a way for “min timestamp” to work - you can set the min time >>>> stamp column with an explicit “using timestamp” timestamp = 2^31-NOW, so >>>> that future writes won’t overwrite those values. That gives you a first >>>> write win behavior for that column, which gives you an effective min >>>> timestamp for the partition as a whole. >>>>> >>>>> -- >>>>> Jeff Jirsa >>>>> >>>>> >>>>>> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <arhel...@gmail.com> wrote: >>>>>> >>>>>> Hi folks, >>>>>> >>>>>> Currently, I working on custom CQL operator that should return the max >>>>>> timestamp for some partition. >>>>>> >>>>>> I don't think that scanning of partition for that kind of data is a >>>> nice >>>>>> idea. Instead of it, I thinking about adding a metadata to the >>>> partition. I >>>>>> want to store minTimestamp and maxTimestamp for every partition as it >>>>>> already done in Memtable`s. That timestamps will be updated on each >>>>>> mutation operation, that is quite cheap in comparison to full scan. >>>>>> >>>>>> I quite new to Cassandra codebase and want to get some critics and >>>> ideas, >>>>>> maybe that kind of data already stored somewhere or you have better >>>> ideas. >>>>>> Is my assumption right? >>>>>> >>>>>> Best, >>>>>> Artur >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >>>> >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org