tibrewalpratik17 opened a new issue, #11796: URL: https://github.com/apache/pinot/issues/11796
In one of our use-cases, there are lot of out-of-order events being sent in Kafka. Though we don't consider out-of-order events for updating [Metadata HashMap](https://github.com/apache/pinot/blob/1b7cb166de34f18c9a170cd74934162b3ea103b0/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java#L54), we do [persist the row](https://github.com/apache/pinot/blob/1b7cb166de34f18c9a170cd74934162b3ea103b0/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L493) in the table anyways without any merger (in case of partial-upsert). I am proposing to add a boolean config `disableIngestingOutOfOrderEvents` so that these events don't end up in the segments anyways and save on disk utilisation as well. This will also avoid confusion using `skipUpsert` for partial-upsert tables as nulls start showing up for columns where a previous non-null was encountered and we don't know if it's an out-of-order event or not. We should also add a virtual boolean column `$.isOutOfOrderEvent` to detect these scenarios as well and help in easy debugging when using `skipUpsert`. This is orthogonal to above proposed `disableIngestingOutOfOrderEvents`. cc @Jackie-Jiang let me know your thoughts on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org