adnanhemani commented on PR #1844: URL: https://github.com/apache/polaris/pull/1844#issuecomment-2961406619
Thanks for the high-level comments, @snazy! > It looks that this change introduces another serialization implementation, although we already have Jackson in place. Can you explain why that's needed? My anecdotal experience is that Jackson is a much slower serialization library than Kryo and given that we want to make these commands quick so that the user does not notice a difference, I went ahead with a quicker implementation. I do understand that this brings additional overhead/maintainance - so I'm okay to shift the code to use Jackson instead if that's a hard requirement. WDYT? > It's a bit unclear why these "buffers" are needed. Would you mind elaborating on this? This was earlier discussed on the [ML thread](https://lists.apache.org/thread/jld9cnlv585y5sf8ybl5hkmrnjqskb6p). TL;DR on this is: if we aim to support some read activities in the near future, flushing to persistence constantly per call will become quite heavy on the persistence. To ensure we are not hammering the persistence (which in the current case is the same as the metastore) and causing an accidentally DDoS, these buffers will help. > I have some concerns around how the approach could be implemented in the NoSQL world, via https://github.com/apache/polaris/pull/1189. There we already have a way to iterate over recent changes (aka: events). Do you have some idea around how both could live together? Would be nice to consider NoSQL in this new approach, because we already agreed that NoSQL will become part of Polaris. I think it's better to have a way that satisfies both persistence worlds. I took a brief look at the linked PR for NoSQL - but to be fair, it is a massive PR, so please correct and/or augment my knowledge as required. Per my understanding, the changes are stored to commit to the persistence. I'm in agreement that this is very similar to events in terms of what they are representing, but not in terms of the way they are being used. Events (which I am only modifying in this PR) are to be used as administrator-facing representation of customer-triggered actions, whereas Changes (being introduced in the NoSQL PR) are a way for committing actions to the persistence. Given that Events should not be solely for changes to the persistence, I don't really see how Changes can be used to power and/or replace Events. The way I imagine things would be that Events can (and should) be stored in the NoSQL persistence as well - and any calls to the future Events API should understand which type of persistence layer was used for Events storage and delegate the call to that persistence type. That is why I have introduced an append-only `writeEvents` method in the `BasePersistence` interface - NoSQL should also implement that. Base line to state: I 100% agree that the Events functionality should exist in NoSQL as well as JDBC-Relational - and I'm happy to help contribute towards this once NoSQL finalizes and merges. But I'm not sure that Changes is helpful in this journey - unless we'd like to evolve that object into something that can represent all events as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
