alamb commented on PR #21829:
URL: https://github.com/apache/datafusion/pull/21829#issuecomment-4327774458

   > must either block a thread or pre-resolve everything before planning 
begins. Neither is acceptable at scale.
   
   Could you explain why pre-resolving references doesn't work for your usecase?
   
   The sync only catalog system is by design I think: 
https://docs.rs/datafusion/latest/datafusion/catalog/trait.CatalogProvider.html#implementing-remote-catalogs
   
   I find it hard to believe that batching resolution up front (rather than 
doing IO on demand driven by the catalog) will not work "at scale". I will 
admit it is more complicated to implement for downstream users
   
   
   You have probably seen: 
https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/data_io/remote_catalog.rs
   
   As for better control/visibility into the order of optimizer passes, making 
that easier to use does seem valuable
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to