jackye1995 commented on PR #7105: URL: https://github.com/apache/iceberg/pull/7105#issuecomment-1482286193
sorry I totally overlooked that, I would also -1 for using a specific external dependency like RocksDB or HBase, that was probably why I just quickly skipped those options... But I feel the semantics required for partition stats just does not fit a file storage system, as you said it ends up having to choose between CoW and MoR, which seems like too much complexity to just manage some additional stats. I think we can start from a file storage (FileIO) based solution, but the spec should be at higher level such that it could be backed by more efficient solutions. I guess there is the same argument also for things like manifest list, today rolling up manifest list is a bottleneck for write operations, and some kind of design backed by a key-value store could solve that bottleneck. Maybe we should think about that and try to solve these cases together? Just like we have `FileIO` that works really well with object storage semantics, we can have something like `VersionedListStore` that works well with any mutable but versioned list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
