thinkharderdev commented on PR #23167:
URL: https://github.com/apache/datafusion/pull/23167#issuecomment-4798451122

   > Reg to stats, are they calculated on runtime? if one of the relationship 
is filtered/generated table, how rowcounts calculated for partition stats?
   
   What I am imagining is that the operators update stats as they process the 
data. So pre-execution the operator returns partition stats based on static 
statistics (like it does now) but as data flows through they are updated. For a 
pipeline breaker if you call `partition_statistics` once all input is consumed 
you get back `Precision::Exact` stats because the precise values are known. 
   
   I'm concerned that the approach outlined in this PoC is moving towards an 
entirely separate and parallel optimization framework which seems confusing. 
There shouldn't really be a difference between optimizing a plan pre-execution 
vs in-flight as ultimately you are just (in principle) doing the same 
optimization passes but with better statistics. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to