walterddr opened a new issue, #10755:
URL: https://github.com/apache/pinot/issues/10755

   Hi All, 
   we were discussing the threading model of the current V2 engine execution 
runtime and we observed several issues
   
   Issues
   ===
   1. under high QPS environment we have many thread contentions due to thread 
waiting on mailbox sender; 
   2. we have some issues with a single opChain waiting for multiple mailbox 
receiving when the behavior; 
   3. we also wanted to add a "pipeline breaker" concept to the executing 
opChain where some of the operators will not produce any data until all input 
block has been consumed (for example sort operator or aggregate/group-by 
operator)
   
   Status Quo
   ===
   the current execution model requires every one of these runtime operators to 
return a no-op block and yield the opChain out of the execution threadpool b/c 
it can potentially cause platform thread to distributedly deadlock (see more 
detail in 
https://docs.google.com/document/d/1Vh_UAaY9WWB3dfRUTAoNzU4BrPAQEni7Xd6w84SH_Ow/edit#
 and 
https://docs.google.com/document/d/1XAMHAlhFbINvX-kK1ANlzbRz4_RkS0map4qhqs1yDtE/edit?usp=drivesdk)
   
   this is extremely efficient in terms of managing the opChain entering and 
leaving the scheduler service and thus we were able to utilize the threadpool 
with less os thread context switching; however it also posts several problems:
   - there are many problem maintaining the no-op block populating logic and 
each operator seems to be a bit different in handling the no-op population
   - it is a bit hard to implement multiple wake up b/c it doesn't allow the 
system to resume from a particular parked operator (entry point is always root 
of opchain)
   - it makes the operator chain complex -- as it needs to handle blocking 
context
   
   Proposal
   ===
   our goal is to 
   1. unblock the opChain thread content (where no-op block are not properly 
returned, or cannot be properly returned)
   2. makes it easy to add new operators without handling context of blocking 
on each operator
   3. adding more complex blocking mechansim without distributed deadlock (such 
as pipeline breaker mentioned above)
   
   Candidate Solutions
   ===
   1. we can make no-op block handle solely in multistage operator base class, 
this would still be complex as the operator needs to call base class logic; 
although it will make the operator logic still somewhere blocking context 
aware. 
   2. we can also remove no-op block and directly return the blocking signal 
from operator (operator has access to opChainExecutionContext which doesn't 
require bottom-up return of opchain)
   
   POC Status
   ===
   we've implemented the candidate solution (2) above in 
https://github.com/walterddr/pinot/pull/58, for several reasons
   
   PRO
   ----
   * it is vastly simpler in terms of operator and opchain. as other than 
mailbox operator nothing is blocking (see the lines added vs lines removed in 
main classes)
   * it gives possibility for different type of wait-notify (e.g. park on 
operator, opchain root, and wait on different signals for wake up, b/c all 
signaling are through opChainExecutionContext)
   * with https://openjdk.org/projects/panama/ we think it is best to let JVM 
handle the threadpool resource management
       * bare in mind that, the current model is still MEM heavy, as the 
OpChain/Context bares the entirety of the operator Chain's memory footprint.
   
   CON
   ----
   * from our basic test, we observed ~10-20% thread context switching overhead 
under high QPS load
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to