ankitsultana opened a new pull request, #15760: URL: https://github.com/apache/pinot/pull/15760
# Summary For Realtime streams, there can be scenarios where a very small percentage of records will be incorrectly assigned to a Stream Partition (when partitioning by a key is enabled). In such a case, the Physical Optimizer will assume the Table Scan to be un-partitioned. This is an overkill for some use-cases, where minute errors (e.g. <0.0001%) are acceptable. This PR adds a query option to enable this feature. The behavior is that we will infer the partition of the segment based on its name. This only works with Realtime segments since for offline segments batch ingestion should be easily able to guarantee that the partitioning is done correctly. (we also don't have a standard way to infer a partition in that case) I have also added some metrics which were missing with the Physical Optimizer (join count, window count, etc.). These were emitted in `RelToPlanNodeConverter`. I now instead do it in a centralized place so we can also emit more metrics in the future. # Testing Added UTs for Leaf Stage worker assignment. We are also testing this out in one of our bigger clusters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org