walterddr opened a new issue, #10657:
URL: https://github.com/apache/pinot/issues/10657

   Background
   ===
   
   Currently we have multiple abstractions reused with different components in 
planner and runtime. it causes several problems
   - when trying to add partition-based routing and planning it is super complex
   - information only required in plan time and dispatch time get leaked to 
runtime which is not useful, but somehow usage is mixed and hard to change
   - mailbox uses information way more than necessary and makes it hard to 
identify b/c the mailboxIdentifier equal requires all those to be identical.
   - ... many other issues
   
   Proposed changes
   === 
   Several abstract is being introduced and will replace the current abstract
   1. Step 1a: replace `VirtualServer` 
   `VirtualServer` is now a `ServerInstance + VirtualID`, it will be replaced 
with
   `Worker` which is indicating parallelism of work. It:
       (1) is globally indexed per stage; 
       (2) mapped to a single `ServerInstance` stored in `StageMetadata`, 
       (3) contains partition or segment info which will be put into a new 
abstract called: `WorkerMetadata` 
   
   with this `VirtualServer` is completely removed, and we decoupled 
`ServerInstance` which is not useful in runtime from `VirtualID` or `workerID` 
which is used in runtime.
   
   - Step 1b: replace identifiers:
       - `MailboxIdentifier` will use `workerID` which is globally indexed to 
uniquely identify a stream as:
         
`reqID|sendingStageID|sendingWorkerID|receivingStageID|receivingStageWorkerID`
       - `OpChainID` will use `WorkerID` as well `reqID|stageID|workerID`
   
   - Step 2: support Hash-Partitioned data distribution
   see: 
https://docs.google.com/document/d/1CdvxmOOctk6kS5PdgCy7f5KVh5urw4YY0YZGbwuPJt4/edit#
   
   - Step 3: support worker assignment based on data partition and 
worker/parallelism
   see: 
https://docs.google.com/document/d/1SKDKV6LXr4uFFUsR3djz5BWWMqcSJIYEqJBoL1zeDD8/edit]
   
   
   CC @Jackie-Jiang @xiangfu0 @ankitsultana @somandal @siddharthteotia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to