tibrewalpratik17 opened a new issue, #11796:
URL: https://github.com/apache/pinot/issues/11796

   In one of our use-cases, there are lot of out-of-order events being sent in 
Kafka. Though we don't consider out-of-order events for updating [Metadata 
HashMap](https://github.com/apache/pinot/blob/1b7cb166de34f18c9a170cd74934162b3ea103b0/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java#L54),
 we do [persist the 
row](https://github.com/apache/pinot/blob/1b7cb166de34f18c9a170cd74934162b3ea103b0/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java#L493)
 in the table anyways without any merger (in case of partial-upsert). 
   
   I am proposing to add a boolean config `disableIngestingOutOfOrderEvents` so 
that these events don't end up in the segments anyways and save on disk 
utilisation as well. This will also avoid confusion using `skipUpsert` for 
partial-upsert tables as nulls start showing up for columns where a previous 
non-null was encountered and we don't know if it's an out-of-order event or 
not. 
   
   We should also add a virtual boolean column `$.isOutOfOrderEvent` to detect 
these scenarios as well and help in easy debugging when using `skipUpsert`. 
This is orthogonal to above proposed `disableIngestingOutOfOrderEvents`.
   
   cc @Jackie-Jiang let me know your thoughts on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to