anuragrai16 commented on PR #17380: URL: https://github.com/apache/pinot/pull/17380#issuecomment-3663704040
> We need to discuss when should we use data CRC instead of index CRC, and what is the side effect. When using data CRC, index only change happening in the deep store (i.e. new index added) won't be honored. This could prevent users from creating the index from minion and reduce the index creation on server. Given we want to solve the problem of real-time committed segment potentially having different CRC, I feel a better way to address this is to add a flag in ZK metadata to indicate that we can check only the data CRC. This flag only exists in committed segment, but not segment pushed from other ingestion flow Thanks @Jackie-Jiang , we can do that. For my understanding, in the current code, I'm only using Data CRC in `doAddOnlineSegment` of the class `OfflineTableDataManager` and `RealtimeTableDataManager` , which are called in helix transition states during `onBecomeOnlineFromConsuming` and `onBecomeOnlineFromOffline`. For the other flows of reload segment, replace segments (used by minions), data CRC is not used. So, in a way, is the code already handling this point ? Or are there other flows that might be accidentally included in this ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
