bigluck commented on issue #428:
URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1962308872

   Ciao @kevinjqliu, thanks!
   
   I've tested it on the same `c5ad.16xlarge` machine, but the results are 
pretty similar, 27s vs 28s for this table:
   ```
   $ pip install 
git+https://github.com/kevinjqliu/iceberg-python.git@kevinjqliu/bin-pack-write
   $ python3 benchmark.py
   ...
   Generating 10,000,000 records in 1,000 batches
    - generate_users: done (in 582.4395 seconds)
    - table size: 7188266296 bytes, 6.69 GB, 10,000,000 records (in 0.0002 
seconds)
    - create empty table: users done (in 0.6575 seconds)
    - append data: done (in 27.7934 seconds)
   ...
   
   -rw-rw-r--. 1 ec2-user ec2-user 6.7G Feb 24 09:04 table_10000000.arrow
   ```
   
   <img width="1033" alt="Screenshot 2024-02-24 at 10 05 26" 
src="https://github.com/apache/iceberg-python/assets/1511095/21601f5d-ed0e-4c35-8315-c14a3e49388d";>
   
   
   This is the final table parquet file on s3:
   
   <img width="1217" alt="Screenshot 2024-02-24 at 10 19 04" 
src="https://github.com/apache/iceberg-python/assets/1511095/b9a3b33d-98e5-4f16-9bab-1a6f14e4383c";>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to