bigluck commented on issue #428: URL: https://github.com/apache/iceberg-python/issues/428#issuecomment-1962308872
Ciao @kevinjqliu, thanks! I've tested it on the same `c5ad.16xlarge` machine, but the results are pretty similar, 27s vs 28s for this table: ``` $ pip install git+https://github.com/kevinjqliu/iceberg-python.git@kevinjqliu/bin-pack-write $ python3 benchmark.py ... Generating 10,000,000 records in 1,000 batches - generate_users: done (in 582.4395 seconds) - table size: 7188266296 bytes, 6.69 GB, 10,000,000 records (in 0.0002 seconds) - create empty table: users done (in 0.6575 seconds) - append data: done (in 27.7934 seconds) ... -rw-rw-r--. 1 ec2-user ec2-user 6.7G Feb 24 09:04 table_10000000.arrow ``` <img width="1033" alt="Screenshot 2024-02-24 at 10 05 26" src="https://github.com/apache/iceberg-python/assets/1511095/21601f5d-ed0e-4c35-8315-c14a3e49388d"> This is the final table parquet file on s3: <img width="1217" alt="Screenshot 2024-02-24 at 10 19 04" src="https://github.com/apache/iceberg-python/assets/1511095/b9a3b33d-98e5-4f16-9bab-1a6f14e4383c"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org