adriangb commented on PR #21996: URL: https://github.com/apache/datafusion/pull/21996#issuecomment-4372503809
Thanks @asolimando ! Thinking about it a bit more the axis that makes the split hard in DataFusion is that DataFusion often is federated to or federeates to other systems. E.g. if the TableProvider is a Postgres table, it's completely reasonable that there is overlap in functionality (Postgres also has a planner, stats system, etc.). But DataFusion also talks to dumb Parquet files... > rather than pushing expression semantics into the provider, which might have their own very different notion (classic impedance mismatch in DB systems) I think the key / nice thing would be to allow the implementer to decide what it can and can't provide (which I've kind of attempted in this PR). The next step would be to let it provide statistics for some parts of expressions and not others (i.e. split up the expression tree). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
