thinkharderdev commented on PR #23167: URL: https://github.com/apache/datafusion/pull/23167#issuecomment-4798451122
> Reg to stats, are they calculated on runtime? if one of the relationship is filtered/generated table, how rowcounts calculated for partition stats? What I am imagining is that the operators update stats as they process the data. So pre-execution the operator returns partition stats based on static statistics (like it does now) but as data flows through they are updated. For a pipeline breaker if you call `partition_statistics` once all input is consumed you get back `Precision::Exact` stats because the precise values are known. I'm concerned that the approach outlined in this PoC is moving towards an entirely separate and parallel optimization framework which seems confusing. There shouldn't really be a difference between optimizing a plan pre-execution vs in-flight as ultimately you are just (in principle) doing the same optimization passes but with better statistics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
