walterddr opened a new issue, #10657: URL: https://github.com/apache/pinot/issues/10657
Background === Currently we have multiple abstractions reused with different components in planner and runtime. it causes several problems - when trying to add partition-based routing and planning it is super complex - information only required in plan time and dispatch time get leaked to runtime which is not useful, but somehow usage is mixed and hard to change - mailbox uses information way more than necessary and makes it hard to identify b/c the mailboxIdentifier equal requires all those to be identical. - ... many other issues Proposed changes === Several abstract is being introduced and will replace the current abstract 1. Step 1a: replace `VirtualServer` `VirtualServer` is now a `ServerInstance + VirtualID`, it will be replaced with `Worker` which is indicating parallelism of work. It: (1) is globally indexed per stage; (2) mapped to a single `ServerInstance` stored in `StageMetadata`, (3) contains partition or segment info which will be put into a new abstract called: `WorkerMetadata` with this `VirtualServer` is completely removed, and we decoupled `ServerInstance` which is not useful in runtime from `VirtualID` or `workerID` which is used in runtime. - Step 1b: replace identifiers: - `MailboxIdentifier` will use `workerID` which is globally indexed to uniquely identify a stream as: `reqID|sendingStageID|sendingWorkerID|receivingStageID|receivingStageWorkerID` - `OpChainID` will use `WorkerID` as well `reqID|stageID|workerID` - Step 2: support Hash-Partitioned data distribution see: https://docs.google.com/document/d/1CdvxmOOctk6kS5PdgCy7f5KVh5urw4YY0YZGbwuPJt4/edit# - Step 3: support worker assignment based on data partition and worker/parallelism see: https://docs.google.com/document/d/1SKDKV6LXr4uFFUsR3djz5BWWMqcSJIYEqJBoL1zeDD8/edit] CC @Jackie-Jiang @xiangfu0 @ankitsultana @somandal @siddharthteotia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org