schenksj commented on PR #3932:
URL: 
https://github.com/apache/datafusion-comet/pull/3932#issuecomment-4238055079

   > This is awesome @schenksj! Thank you!
   > 
   > At 6,500 lines, I'd like to take some time to review this one in stages. 
Without looking too closely at it yet, the first questions that come to mind 
that I want to look at first:
   > 
   > 1. Does Delta have a Spark test suite we can run, similar to what we do 
with Iceberg's Java library to have confidence in the implementation's 
correctness?
   > 2. Are there significant downstream dependencies we pick up with this 
change? We're already struggling a bit with `iceberg-rust` being locked to 
specific DataFusion and Arrow-rs versions. Does this bring the same challenge, 
and would Comet be limited from upgrading until _all_ dependencies are in sync?
   > 
   > Like @andygrove, I am mostly concerned about the maintenance burden. 
Though perhaps I am more concerned about future Comet changes than I am about 
maintaining this new delta code. I am imagining future major Comet changes like 
rewriting our rules to run later to be compatible with AQE improvements in 
Spark 4.0+, and this delta integration becomes something we have to update or 
leave behind. I don't think any of this should be disqualifying from a merge, 
but it's another reason I want to sit with the PR for a bit. I'd like to try to 
imagine ways we could be possibly boxed in by this code.
   > 
   > Thank you again for the contribution! I am looking forward to digging into 
it this week.
   
   Happy to answer any questions you have.  Fortunately, I think most of the 
actual code is test cases.
   
   1. Will look into the whether there is a delta spark acceptance suite like 
the Iceberg one you mentioned
   2. Dependency creep... This is using semver-based crate identity, so both 
versions of arrow co-exist.  I suspect that given that we're not integrating 
FFI from delta-rs, that we might be able to force use of the newer arrow 
without issue.   Will look into it.  If not, the tradeoff is binary size but 
not symbol conflicts (thank you rust).
   
    ## Downstream Dependencies Added by This PR                                 
                                                                                
                                            
                                                       
     ### Direct Dependencies (Cargo.toml)                                       
                                                                                
                                             
      
     | Crate | Version | Purpose |                                              
                                                                                
                                             
     |-------|---------|---------|                                        
     | `delta_kernel` | 0.19 | Delta log replay (scan planning only) |          
                                                                                
                                             
     | `object_store` | 0.12 (renamed `object_store_kernel`) | Required by 
kernel's engine (S3/Azure) |
     | `roaring` | 0.10 | Deletion vector bitmap decoding |                     
                                                                                
                                             
     | `thiserror` | workspace | Error type derive (already in workspace, added 
to core's deps) |                                                               
                                             
      
     ### Transitive: Second Versions of Existing Crates (16)                    
                                                                                
                                             
                                                                          
     Kernel pins arrow-57 / parquet-57 / object_store-0.12 internally. These 
coexist alongside Comet's arrow-58 / parquet-58 / object_store-0.13. No types 
cross the boundary.                               
                                                                          
     `arrow`, `arrow-arith`, `arrow-array`, `arrow-buffer`, `arrow-cast`, 
`arrow-csv`, `arrow-data`, `arrow-ipc`, `arrow-json`, `arrow-ord`, `arrow-row`, 
`arrow-schema`, `arrow-select`, `arrow-string`,    
     `parquet`, `object_store`                                            
                                                                                
                                                                                
                                             
     ### Truly New Crates (10)                                            
   
     | Crate | Pulled by | Purpose |
     |-------|-----------|---------|
     | `delta_kernel` | direct | Core dependency |
     | `delta_kernel_derive` | delta_kernel | Proc macros for kernel |
     | `roaring` | direct | Bitmap codec for deletion vectors |                 
                                                                                
                                             
     | `crc` / `crc-catalog` | delta_kernel | Checksum validation for 
checkpoints |
     | `lz4_flex` | delta_kernel | Log entry compression |                      
                                                                                
                                             
     | `z85` | delta_kernel | Deletion vector encoding |                  
     | `document-features` | delta_kernel | Doc generation |                    
                                                                                
                                             
     | `litrs` | delta_kernel_derive | Literal parsing for proc macros |        
                                                                                
                                             
     | `rustls-pemfile` | kernel's rustls engine | TLS certificate loading |
     | `crossterm` / `crossterm_winapi` | delta_kernel | Transitive dev 
dependency |                                                                    
                                                     
                                                                                
                                                                                
                                             
     ### Java/Scala Dependencies
                                                                                
                                                                                
                                             
     None added to production. Delta-spark is test-scope only (unchanged from 
before this PR).     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to