corleyma commented on issue #402:
URL: https://github.com/apache/iceberg-python/issues/402#issuecomment-2000597425

   Note from Slack: to work in the larger-file usecases where folks are using 
PySpark/Spark, I think this would need to play well with pyarrow streaming 
read/write functionality, so that one could do atomic upsert of batches without 
having to read all the data into memory at once.
   
   I call this out because current write functionality works with pyarrow 
Tables, which are fully materialized in memory.  Working with larger data might 
include making the pyiceberg write APIs work with `Iterator[RecordBatch]` and 
friends (as returned by pyarrow Datasets/Scanner) in addition to pyarrow Tables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to