ege-st commented on code in PR #11496:
URL: https://github.com/apache/pinot/pull/11496#discussion_r1323502750


##########
pinot-core/src/main/java/org/apache/pinot/core/transport/InstanceRequestHandler.java:
##########
@@ -321,6 +322,10 @@ private void sendResponse(ChannelHandlerContext ctx, 
String tableNameWithType, l
         LOGGER.info("Slow query: request handler processing time: {}, send 
response latency: {}, total time to handle "
             + "request: {}", queryProcessingTimeMs, sendResponseLatencyMs, 
totalQueryTimeMs);
       }
+      if (serializedDataTable.length > LARGE_RESPONSE_SIZE_THRESHOLD_BYTES) {
+        LOGGER.warn("Large query: response size in bytes: {}, table name {}",
+            serializedDataTable.length, tableNameWithType);

Review Comment:
   @siddharthteotia yes, the risk with the message size threshold is that if we 
are not very careful in tuning we'll start killing previously fine queries. 
Hence the need for a histogram metric to get an idea of the size of the 
messages. Furthermore, it will heavily dependent on cluster hardware and 
configuration. It requires a lot of thought in terms of both how it's 
implemented, what the defaults are (or if it's off by default), and how we tune 
it in such a way that we don't break functional queries.
   
   The other key point is: we still have to deal with this direct buffer OOM 
issue even with the message size threshold. Remember, the core issue here is 
that the Broker becomes a bad actor in the cluster and starts accepting queries 
and then, essentially, throwing them out. We have to address _that_ issue 
before we worry about more nuanced ways of reducing the likelihood of an OOM 
happening.
   
   My proposal is that we do a multipart fix: 
   1. we either use this fix or we treat an OOM as a fault and shut the Broker 
down (so that it is removed from the cluster asap and ceases to be a bad 
actor), 
   2. once that's out, we add in metrics to collect about message properties, 
   3. we add a message size threshold feature combined with a robust guide for 
enabling and tuning that will greatly reduce the risk of the OOM happening.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to