jadami10 commented on PR #13298:
URL: https://github.com/apache/pinot/pull/13298#issuecomment-2361386066

   > the earlier metric was really noisy since it relies on time column value 
instead of ingestion time which lead to false positives.
   
   That sounds like a misuse of the `StreamMessageMetadata`. There's 2 fields 
in 
https://github.com/apache/pinot/blob/master/pinot-spi/src/main/java/org/apache/pinot/spi/stream/StreamMessageMetadata.java#L71,
 `getRecordIngestionTimeMs` and `getFirstStreamRecordIngestionTimeMs` with 2 
corresponding metrics to distinguish between between source time and publish 
time.
   
   > would making this metric configurable help? that way you'd be able to 
disable it without code changes
   
   Only if we don't have a way to cap frequency. And it should be off by 
default.
   
   > let me also see it there's a way to reduce frequency
   
   I think a key part here is we need to cap the frequency. For large Pinot 
deployments, you may have thousands of tables and hundreds of thousands of 
partitions consumed. So the baseline is O(100k) calls. But adding a new table 
consuming N partitions shouldn't add N more calls. We effectively need a global 
throttle, though I don't think there's a way to prevent starvation with large 
enough scale.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to