ankitsultana commented on code in PR #10108:
URL: https://github.com/apache/pinot/pull/10108#discussion_r1067648250


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/mailbox/GrpcMailboxService.java:
##########
@@ -44,16 +48,17 @@
  * </ul>
  */
 public class GrpcMailboxService implements MailboxService<TransferableBlock> {
+  private static final Duration MAILBOX_CACHE_EXPIRY = Duration.ofMinutes(5);
   // channel manager
   private final ChannelManager _channelManager;
   private final String _hostname;
   private final int _mailboxPort;
 
   // maintaining a list of registered mailboxes.
-  private final ConcurrentHashMap<String, ReceivingMailbox<TransferableBlock>> 
_receivingMailboxMap =
-      new ConcurrentHashMap<>();
-  private final ConcurrentHashMap<String, SendingMailbox<TransferableBlock>> 
_sendingMailboxMap =
-      new ConcurrentHashMap<>();
+  private final Cache<String, ReceivingMailbox<TransferableBlock>> 
_receivingMailboxCache =
+      
CacheBuilder.newBuilder().expireAfterAccess(MAILBOX_CACHE_EXPIRY.toMinutes(), 
TimeUnit.MINUTES).build();
+  private final Cache<String, SendingMailbox<TransferableBlock>> 
_sendingMailboxCache =
+      
CacheBuilder.newBuilder().expireAfterAccess(MAILBOX_CACHE_EXPIRY.toMinutes(), 
TimeUnit.MINUTES).build();

Review Comment:
   Yeah I agree on the ownership logic and closing the OpChain gracefully. I 
believe @61yao is working on it?
   
   This PR is not to address the design itself and it rather leaves a 
code-smell, but the intention was to make the main branch usable since right 
now we were seeing servers hitting OOMs every few hours.
   
   > using expireAfter isn't going to work IMO b/c the access pattern needs to 
also close the associated opChain indicating they are no longer valid.
   
   I attached a screenshot in the description. This does seem to have made our 
internal build more usable. My guess is that the following scenario was 
happening a lot in our build leading to large leaks:
   
   1. Leaf stage is very costly and takes longer than the timeout
   2. MailboxReceive for the receiver of the leaf stages have already exited 
and hung up.
   3. When the code finally reaches an InMemory MailboxSend, the mailbox send 
ends up populating the the queue with the TransferableBlock ultimately leaking 
it via the map.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to