ankitsultana commented on code in PR #10108: URL: https://github.com/apache/pinot/pull/10108#discussion_r1067648250
########## pinot-query-runtime/src/main/java/org/apache/pinot/query/mailbox/GrpcMailboxService.java: ########## @@ -44,16 +48,17 @@ * </ul> */ public class GrpcMailboxService implements MailboxService<TransferableBlock> { + private static final Duration MAILBOX_CACHE_EXPIRY = Duration.ofMinutes(5); // channel manager private final ChannelManager _channelManager; private final String _hostname; private final int _mailboxPort; // maintaining a list of registered mailboxes. - private final ConcurrentHashMap<String, ReceivingMailbox<TransferableBlock>> _receivingMailboxMap = - new ConcurrentHashMap<>(); - private final ConcurrentHashMap<String, SendingMailbox<TransferableBlock>> _sendingMailboxMap = - new ConcurrentHashMap<>(); + private final Cache<String, ReceivingMailbox<TransferableBlock>> _receivingMailboxCache = + CacheBuilder.newBuilder().expireAfterAccess(MAILBOX_CACHE_EXPIRY.toMinutes(), TimeUnit.MINUTES).build(); + private final Cache<String, SendingMailbox<TransferableBlock>> _sendingMailboxCache = + CacheBuilder.newBuilder().expireAfterAccess(MAILBOX_CACHE_EXPIRY.toMinutes(), TimeUnit.MINUTES).build(); Review Comment: Yeah I agree on the ownership logic and closing the OpChain gracefully. I believe @61yao is working on it? This PR is not to address the design itself and it rather leaves a code-smell, but the intention was to make the main branch usable since right now we were seeing servers hitting OOMs every few hours. > using expireAfter isn't going to work IMO b/c the access pattern needs to also close the associated opChain indicating they are no longer valid. I attached a screenshot in the description. This does seem to have made our internal build more usable. My guess is that the following scenario was happening a lot in our build leading to large leaks: 1. Leaf stage is very costly and takes longer than the timeout 2. MailboxReceive for the receiver of the leaf stages have already exited and hung up. 3. When the code finally reaches an InMemory MailboxSend, the mailbox send ends up populating the the queue with the TransferableBlock ultimately leaking it via the map. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org