[I] Parallel Table.append [iceberg-python]

via GitHub Wed, 14 Feb 2024 02:30:25 -0800


bigluck opened a new issue, #428:
URL: https://github.com/apache/iceberg-python/issues/428


   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   While doing some tests with the latest RC (`v0.6.0rc5`), I generated a 
~6.7GB arrow table and appended it to a new table.
   
   In terms of performances, I got similar results (writing to S3) on these 2 
type of EC2 machines:
   - `c5ad.8xlarge`  32 core, 64 ram, 10gbps nic -> **wrote 1 parquet file of 
2GB in 31s**
   - `c5ad.16xlarge` 64 core, 128 ram, 20gbps nic -> **wrote 1 parquet file of 
1.6GB in 28s**
   
   By using `htop` I notice that the code was only using a thread during the 
append operation, which means that it's not parallelizing the write operation.
   
   ![Screenshot 2024-02-13 at 14 26 
35](https://github.com/apache/iceberg-python/assets/1511095/d0139456-c7ff-4270-a48e-707450b831ef)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Parallel Table.append [iceberg-python]

Reply via email to