alamb commented on PR #21829: URL: https://github.com/apache/datafusion/pull/21829#issuecomment-4327774458
> must either block a thread or pre-resolve everything before planning begins. Neither is acceptable at scale. Could you explain why pre-resolving references doesn't work for your usecase? The sync only catalog system is by design I think: https://docs.rs/datafusion/latest/datafusion/catalog/trait.CatalogProvider.html#implementing-remote-catalogs I find it hard to believe that batching resolution up front (rather than doing IO on demand driven by the catalog) will not work "at scale". I will admit it is more complicated to implement for downstream users You have probably seen: https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/data_io/remote_catalog.rs As for better control/visibility into the order of optimizer passes, making that easier to use does seem valuable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
