ankitsultana opened a new pull request, #15760:
URL: https://github.com/apache/pinot/pull/15760

   # Summary
   
   For Realtime streams, there can be scenarios where a very small percentage 
of records will be incorrectly assigned to a Stream Partition (when 
partitioning by a key is enabled).
   
   In such a case, the Physical Optimizer will assume the Table Scan to be 
un-partitioned.
   
   This is an overkill for some use-cases, where minute errors (e.g. <0.0001%) 
are acceptable.
   
   This PR adds a query option to enable this feature. The behavior is that we 
will infer the partition of the segment based on its name. This only works with 
Realtime segments since for offline segments batch ingestion should be easily 
able to guarantee that the partitioning is done correctly. (we also don't have 
a standard way to infer a partition in that case)
   
   I have also added some metrics which were missing with the Physical 
Optimizer (join count, window count, etc.). These were emitted in 
`RelToPlanNodeConverter`. I now instead do it in a centralized place so we can 
also emit more metrics in the future.
   
   # Testing
   
   Added UTs for Leaf Stage worker assignment. We are also testing this out in 
one of our bigger clusters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to