Re: Getting partition min/max timestamp

Jeremiah Jordan Sun, 14 Jan 2018 14:33:06 -0800

Finding the max timestamp of a partition is an aggregation.  Doing that 
calculation purely on the replica (wether pre-calculated or not) is problematic 
for any CL > 1 in the face of deletions or update that are missing. As the 
contents of the partition on a given replica are different than what they would 
be when merged on the coordinator.


> On Jan 14, 2018, at 3:33 PM, "arhel...@gmail.com"<arhel...@gmail.com> wrote:
> 
> First of all, thx for all the ideas. 
> 
> Benedict ElIiott Smith, in code comments I found a notice that data in 
> EncodingStats can be wrong, not sure that its good idea to use it for 
> accurate results. As I understand incorrect data is not a problem for the 
> current use case of it, but not for my one. Currently, I added fields for 
> every AtomicBTreePartition. That fields I update in addAllWithSizeDelta call, 
> but also now I get that I should think about the case of data removing.
> 
> I currently don't really care about TTL's, but its the case about I should 
> think, thx.
> 
> Jeremiah Jordan, thx for notice, but I don't really get what are you mean 
> about replica aggregation optimization’s. Can you please explain it for me?
> 
>> On 2018-01-14 17:16, Benedict Elliott Smith <bened...@apache.org> wrote: 
>> (Obviously, not to detract from the points that Jon and Jeremiah make, i.e.
>> that if TTLs or tombstones are involved the metadata we have, or can add,
>> is going to be worthless in most cases anyway)
>> 
>> On 14 January 2018 at 16:11, Benedict Elliott Smith <bened...@apache.org>
>> wrote:
>> 
>>> We already store the minimum timestamp in the EncodingStats of each
>>> partition, to support more efficient encoding of atom timestamps.  This
>>> just isn't exposed beyond UnfilteredRowIterator, though it probably could
>>> be.
>>> 
>>> Storing the max alongside would still require justification, though its
>>> cost would actually be fairly nominal (probably only a few bytes; it
>>> depends on how far apart min/max are).
>>> 
>>> I'm not sure (IMO) that even a fairly nominal cost could be justified
>>> unless there were widespread benefit though, which I'm not sure this would
>>> provide.  Maintaining a patched variant of your own that stores this
>>> probably wouldn't be too hard, though.
>>> 
>>> In the meantime, exposing and utilising the minimum timestamp from
>>> EncodingStats is probably a good place to start to explore the viability of
>>> the approach.
>>> 
>>> On 14 January 2018 at 15:34, Jeremiah Jordan <jerem...@datastax.com>
>>> wrote:
>>> 
>>>> Don’t forget about deleted and missing data. The bane of all on replica
>>>> aggregation optimization’s.
>>>> 
>>>>> On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jji...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> You’re right it’s not stored in metadata now. Adding this to metadata
>>>> isn’t hard, it’s just hard to do it right where it’s useful to people with
>>>> other data models (besides yours) so it can make it upstream (if that’s
>>>> your goal). In particular the worst possible case is a table with no
>>>> clustering key and a single non-partition key column. In that case storing
>>>> these extra two long time stamps may be 2-3x more storage than without,
>>>> which would be a huge regression, so you’d have to have a way to turn that
>>>> feature off.
>>>>> 
>>>>> 
>>>>> Worth mentioning that there are ways to do this without altering
>>>> Cassandra -  consider using static columns that represent the min timestamp
>>>> and max timestamp. Create them both as ints or longs and write them on all
>>>> inserts/updates (as part of a batch, if needed). The only thing you’ll have
>>>> to do is find a way for “min timestamp” to work - you can set the min time
>>>> stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so
>>>> that future writes won’t overwrite those values. That gives you a first
>>>> write win behavior for that column, which gives you an effective min
>>>> timestamp for the partition as a whole.
>>>>> 
>>>>> --
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>>> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <arhel...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi folks,
>>>>>> 
>>>>>> Currently, I working on custom CQL operator that should return the max
>>>>>> timestamp for some partition.
>>>>>> 
>>>>>> I don't think that scanning of partition for that kind of data is a
>>>> nice
>>>>>> idea. Instead of it, I thinking about adding a metadata to the
>>>> partition. I
>>>>>> want to store minTimestamp and maxTimestamp for every partition as it
>>>>>> already done in Memtable`s. That timestamps will be updated on each
>>>>>> mutation operation, that is quite cheap in comparison to full scan.
>>>>>> 
>>>>>> I quite new to Cassandra codebase and want to get some critics and
>>>> ideas,
>>>>>> maybe that kind of data already stored somewhere or you have better
>>>> ideas.
>>>>>> Is my assumption right?
>>>>>> 
>>>>>> Best,
>>>>>> Artur
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>> 
>>>> 
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Getting partition min/max timestamp

Reply via email to