jadami10 commented on PR #17254: URL: https://github.com/apache/pinot/pull/17254#issuecomment-3577480262
> Will this issue happen to Dedup as well? My understanding from dedup is that [here](https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/dedup/ConcurrentMapPartitionDedupMetadataManager.java#L125-L127), we already treat metadata out of TTL as non-existent. So it doesn't matter when the cleanup happens; an expired primary key is getting ignored either way. In existing OSS code, this works correctly because everything is backed by a concurrent hashmap, so all of the operations are thread safe. Since dedup is per partition, it's also assumed that all events for a single partition are ordered, so side effects like restarting a server will lead to the same events ingested in the same order. Therefore, we should always be consistently applying deduplication regardless of if the cleanup is done when the new consuming segment is created or on some other schedule. I haven't compared this to how upsert works since they seemed like separate implementations. Are they sharing some code/behavior I'm missing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
