zhuqi-lucas commented on issue #21598:
URL: https://github.com/apache/datafusion/issues/21598#issuecomment-4295247616

   Thanks @alamb! The `try_next_reader` / `peek_next_row_group` / 
`skip_next_row_group` APIs in arrow-rs are exactly what we need for dynamic RG 
pruning.
   
   In #21580 we've built the statistics-driven TopK optimization chain (file 
reorder + RG reorder + stats init + cumulative prune) which gives 17-60x 
improvement on sorted data by skipping RGs at planning time. But for the WHERE 
case and overlapping RGs, we need runtime dynamic pruning — checking the TopK 
threshold against the next RG's statistics between reads.
   
   The morsel split via `try_next_reader` would let us do this naturally: when 
a morsel planner calls `try_next_reader`, it can check `peek_next_row_group()` 
statistics against the current `DynamicFilterPhysicalExpr` threshold and 
`skip_next_row_group()` if prunable.
   
   This is tracked in #21399. Happy to help implement the dynamic pruning logic 
on top of the morsel split once the API is wired up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to