vshel commented on issue #5997:
URL: https://github.com/apache/iceberg/issues/5997#issuecomment-1290277795

   @ismailsimsek I tried running OPTIMIZE with athena on a partition with ~25 
000 files totalling 2.6GB (so pretty small dataset), it failed with an internal 
error after 8 minutes, I created a support ticket for AWS to investigate, but 
it's not looking promising now, considering whole table dataset is 6TB.
   
   Additionally, after experimenting, Athena read performance is horrible 
unless I do a compaction, I tested a small 25MB dataset, it takes athena 50 
seconds to get 100 000 records out of this iceberg 25MB table or to do a 
COUNT(*), and after I do compaction it takes 8 seconds for athena to do 
retrival and count operations.
   All files in the dataset have a corresponding delete, because I am doing 
upserts of streaming data. So, it looks like upserting (delete + write) slows 
down athena read performance, compaction fixes it as it removes deletes. I 
tested performances without deletes by doing just writes during streaming of 
this 25MB dataset and read performance was 8 seconds even without running 
compaction.
   
   So, Iceberg athena read performace is looking to be very slow, considering 
non-iceberg athena tables that span 60GB of data can run COUNT(*) in just 4 
seconds, compared to Iceberg's 8 seconds for 25MB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to