Re: [PR] ParallelIterable: Queue Size w/ O(1) [iceberg]

via GitHub Wed, 01 Jan 2025 03:56:30 -0800


shanielh commented on PR #11895:
URL: https://github.com/apache/iceberg/pull/11895#issuecomment-2566977078


   > LGTM as well ! Thank you for the fix !
   > 
   > > have a JFR dump that shows this method uses 35% CPU utilization, this
   > > is why I think this commit is important
   > 
   > interesting queue must really be huge, do you know what the manifest size 
/ count we are looking at or more details of the table state ?
   
   Actually I was using `ParallelIterable` in order to read multiple parquet 
files in order to compact them, and to scan manifest files.
   
   Table had 180 manifest files with a lot of files:
   
   ```sql
   select count(*), 
          sum(added_data_files_count), 
          sum(existing_data_files_count), 
          sum(deleted_data_files_count) 
     from schema."table$manifests";
   ```
   
   | count(*) | sum(added_data_files_count) | sum(existing_data_files_count) | 
sum(deleted_data_files_count) |
   
|----------|-----------------------------|--------------------------------|-------------------------------|
   | 180      | 1826                        | 2703684                        | 
6844                          |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] ParallelIterable: Queue Size w/ O(1) [iceberg]

Reply via email to