vshel commented on issue #5997: URL: https://github.com/apache/iceberg/issues/5997#issuecomment-1290277795
@ismailsimsek I tried running OPTIMIZE with athena on a partition with ~25 000 files totalling 2.6GB (so pretty small dataset), it failed with an internal error after 8 minutes, I created a support ticket for AWS to investigate, but it's not looking promising now, considering whole table dataset is 6TB. Additionally, after experimenting, Athena read performance is horrible unless I do a compaction, I tested a small 25MB dataset, it takes athena 50 seconds to get 100 000 records out of this iceberg 25MB table or to do a COUNT(*), and after I do compaction it takes 8 seconds for athena to do retrival and count operations. All files in the dataset have a corresponding delete, because I am doing upserts of streaming data. So, it looks like upserting (delete + write) slows down athena read performance, compaction fixes it as it removes deletes. I tested performances without deletes by doing just writes during streaming of this 25MB dataset and read performance was 8 seconds even without running compaction. So, Iceberg athena read performace is looking to be very slow, considering non-iceberg athena tables that span 60GB of data can run COUNT(*) in just 4 seconds, compared to Iceberg's 8 seconds for 25MB. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org