mattmartin14 commented on PR #1534:
URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2608311564

   Also @kevinjqliu - To address your question on datafusion. When I looked 
into this feature, I explored these 3 options for an arrow processing engine:
   
   1. Duckdb
   2. Datafusion
   3. Daft
   
   I ultimately decided that datafusion would make the most sense, given these 
things it had going:
   
   - It's already owned by the Apache foundation. So licensing would be a 
non-issue
   - its very light weight and specifically designed to process and query arrow 
tables
   - it's rust based and if pyiceberg is ultimately going to be migrated to 
iceberg-rust one day, the integrations would be easier
   - The iceberg rust project is already building integrations for it, as seen 
[here](https://github.com/apache/iceberg-rust/tree/main/crates/integrations/datafusion).
 
   
   Hope this helps on how I arrived at that conclusion. Just using native 
pyarrow to try and process the data would be a very large uphill battle as we 
would effectively have to build our own data processing engine with it e.g. 
hash joins, sorting, optimizations, etc. I figured it does not make sense to 
reinvent the wheel and instead use an engine that is already out there 
(datafusion) and put it to good use.
   
   I took a look at the attachment you posted for any upcoming meetings for the 
pyiceberg sync, but did not see any 2025 meetings listed. I'd be glad to attend 
to discuss this further, if needed.
   
   Thanks,
   Matt


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to