ajantha-bhat commented on PR #9437:
URL: https://github.com/apache/iceberg/pull/9437#issuecomment-1976956134

   I did some benchmarking using the `FileGenerationUtil` (changes included in 
the PR `TestPartitionStatsPerf`). 
   **Looks like the local algorithm is performant compared to distributed 
one.** 
   
   ```
   case 1: FileGenerationUtil.generateDataFile took 30 minutes to generate 10k 
partitions with 2 data file entry for each partition.
   
   1.4 seconds - local algorithm
   3.3 seconds - distributed algorithm
   
   case 2: FileGenerationUtil.generateDataFile took 25 seconds to generate 20 
paritions with 10K data file entry for each partition.
   
   1.7 seconds - local algorithm
   4.1 seconds - distributed algorithm
   ```
   Note: For case 1, I can increase the number of partition some more, but the 
generation takes hours. Will try it out at night. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to