mbutrovich commented on PR #3932: URL: https://github.com/apache/datafusion-comet/pull/3932#issuecomment-4237747957
This is awesome @schenksj! Thank you! At 6,500 lines, I'd like to take some time to review this one in stages. Without looking too closely at it yet, the first questions that come to mind that I want to look at first: 1. Does Delta have a Spark test suite we can run, similar to what we do with Iceberg's Java library to have confidence in the implementation's correctness? 2. Are there significant downstream dependencies we pick up with this change? We're already struggling a bit with `iceberg-rust` being locked to specific DataFusion and Arrow-rs versions. Does this bring the same challenge, and would Comet be limited from upgrading until _all_ dependencies are in sync? Like @andygrove, I am mostly concerned about the maintenance burden. Though perhaps I am more concerned about future Comet changes than I am about maintaining this new delta code. I am imagining future major Comet changes like rewriting our rules to run later to be compatible with AQE improvements in Spark 4.0+, and this delta integration becomes something we have to update or leave behind. I don't think any of this should be disqualifying from a merge, but it's another reason I want to sit with the PR for a bit. I'd like to try to imagine ways we could be possibly boxed in by this code. Thank you again for the contribution! I am looking forward to digging into it this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
