[GitHub] [pinot] 61yao commented on pull request #9836: [multistage][bugfix] Move sending mailbox instance from global map to request context

GitBox Mon, 21 Nov 2022 13:11:15 -0800


61yao commented on PR #9836:
URL: https://github.com/apache/pinot/pull/9836#issuecomment-1322647075


   > Thanks @61yao! I think you've identified the right problem to solve, but I 
think the abstractions in this PR aren't what we need.
   > 
   > From a code ownership perspective, I don't think it makes sense for an 
operator chain to "own" the underlying physical resources it uses to 
send/receive data. That'll make it much more difficult in the future to share 
those resources, and it makes management of those resources happen via various 
callbacks/asynchronous actions - that in turn makes it harder to debug and 
harder to reason about.
   > 
   > Instead, I think the current abstraction is pretty good (there's a 
centralized MailboxService that maintains the mailboxes and there's only one of 
them for the lifetime of a Pinot server). The problem you've identified is that 
we need some way to let that centralized service clean up resources that don't 
exist. I think there might be various different triggers to trigger that:
   > 
   > 1. (as you've identified) an operator finished (either successfully or in 
error)
   > 2. there might be some kind of timeouts
   > 3. there might be some kind of admin operation that forces it to clean up 
those mailboxes
   > 
   > The design here (decentralizing the mailbox ownership) would make 2/3 much 
more difficult - and it leaks information into places that don't need that 
information.
   > 
   > As an aside, I don't think contention for the concurrent hashmap is a 
problem - so long as there isn't key contention (which there should almost 
never be) it will perform extremely fast (and access to it is almost certainly 
not a bottleneck compared to all the other things a query needs to do).
   
   I agree centralized resource should be in central place. but resources 
created per request should clean up per request instead of leaving it to 
centralized request management. 
   
   For example, shared channel between servers is shared and should be managed 
globally. 
   However, streaming channel, instance or request opened by per request should 
be cleaned up per request rather than leaving it in central map especially 
sending mailbox. 
   
   I agree it should not leave inside opchain. but I haven't found a good place 
to hold the request context yet. 
   
   It doesn't make sense to put per request resource in the central place and 
clean it up later. It makes it so much easier to leak. 
   
   Ideally, timeout and error should receive the same exit point where we clean 
up per request resource. 
   
   The actual physical resource say the data block can live globally but the 
instance of mailbox and the streaming channel should still be managed per 
resource purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] 61yao commented on pull request #9836: [multistage][bugfix] Move sending mailbox instance from global map to request context

Reply via email to