zhuqi-lucas commented on issue #21598: URL: https://github.com/apache/datafusion/issues/21598#issuecomment-4295247616
Thanks @alamb! The `try_next_reader` / `peek_next_row_group` / `skip_next_row_group` APIs in arrow-rs are exactly what we need for dynamic RG pruning. In #21580 we've built the statistics-driven TopK optimization chain (file reorder + RG reorder + stats init + cumulative prune) which gives 17-60x improvement on sorted data by skipping RGs at planning time. But for the WHERE case and overlapping RGs, we need runtime dynamic pruning — checking the TopK threshold against the next RG's statistics between reads. The morsel split via `try_next_reader` would let us do this naturally: when a morsel planner calls `try_next_reader`, it can check `peek_next_row_group()` statistics against the current `DynamicFilterPhysicalExpr` threshold and `skip_next_row_group()` if prunable. This is tracked in #21399. Happy to help implement the dynamic pruning logic on top of the morsel split once the API is wired up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
