adnanhemani commented on PR #1844:
URL: https://github.com/apache/polaris/pull/1844#issuecomment-2961406619

   Thanks for the high-level comments, @snazy!
   
   > It looks that this change introduces another serialization implementation, 
although we already have Jackson in place. Can you explain why that's needed?
   
   My anecdotal experience is that Jackson is a much slower serialization 
library than Kryo and given that we want to make these commands quick so that 
the user does not notice a difference, I went ahead with a quicker 
implementation. I do understand that this brings additional 
overhead/maintainance - so I'm okay to shift the code to use Jackson instead if 
that's a hard requirement. WDYT?
   
   > It's a bit unclear why these "buffers" are needed. Would you mind 
elaborating on this?
   
   This was earlier discussed on the [ML 
thread](https://lists.apache.org/thread/jld9cnlv585y5sf8ybl5hkmrnjqskb6p). 
TL;DR on this is: if we aim to support some read activities in the near future, 
flushing to persistence constantly per call will become quite heavy on the 
persistence. To ensure we are not hammering the persistence (which in the 
current case is the same as the metastore) and causing an accidentally DDoS, 
these buffers will help.
   
   > I have some concerns around how the approach could be implemented in the 
NoSQL world, via https://github.com/apache/polaris/pull/1189. There we already 
have a way to iterate over recent changes (aka: events). Do you have some idea 
around how both could live together? Would be nice to consider NoSQL in this 
new approach, because we already agreed that NoSQL will become part of Polaris. 
I think it's better to have a way that satisfies both persistence worlds.
   
   I took a brief look at the linked PR for NoSQL - but to be fair, it is a 
massive PR, so please correct and/or augment my knowledge as required.
   
   Per my understanding, the changes are stored to commit to the persistence. 
I'm in agreement that this is very similar to events in terms of what they are 
representing, but not in terms of the way they are being used. Events (which I 
am only modifying in this PR) are to be used as administrator-facing 
representation of customer-triggered actions, whereas Changes (being introduced 
in the NoSQL PR) are a way for committing actions to the persistence. Given 
that Events should not be solely for changes to the persistence, I don't really 
see how Changes can be used to power and/or replace Events. 
   
   The way I imagine things would be that Events can (and should) be stored in 
the NoSQL persistence as well - and any calls to the future Events API should 
understand which type of persistence layer was used for Events storage and 
delegate the call to that persistence type. That is why I have introduced an 
append-only `writeEvents` method in the `BasePersistence` interface - NoSQL 
should also implement that.
   
   Base line to state: I 100% agree that the Events functionality should exist 
in NoSQL as well as JDBC-Relational - and I'm happy to help contribute towards 
this once NoSQL finalizes and merges. But I'm not sure that Changes is helpful 
in this journey - unless we'd like to evolve that object into something that 
can represent all events as well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to