Re: [PR] Use batchreader in upsert [iceberg-python]

via GitHub Mon, 02 Jun 2025 13:15:23 -0700


koenvo commented on PR #1995:
URL: https://github.com/apache/iceberg-python/pull/1995#issuecomment-2932336792


   > fwiw I think we should try to get this merged in at some point. Some ideas:
   > 
   > 1. Make it a flag to use the batchreader or not, some users might have 
basically infinite memory
   > 2. Is there a more optimal way to batch data? Thinking along the lines of 
using partitions although that may already happen under the hood
   
   I've been thinking about what I (as a developer) want. The answer is: set 
max memory usage. 
   
   Some ideas:
   1. Determine which partitions can fit together in memory and batch load 
those together
   2.  Fetching of parquet files can happen parallel and only do loading 
sequential
   3. Combine 1 and 2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Use batchreader in upsert [iceberg-python]

Reply via email to