[GitHub] [iceberg] szehon-ho commented on issue #7431: iceberg mor table execute merge very very slow

via GitHub Wed, 10 May 2023 15:40:16 -0700


szehon-ho commented on issue #7431:
URL: https://github.com/apache/iceberg/issues/7431#issuecomment-1542897084


   Hi, thanks.  Yea I am comparing the two stage trees, nothing immediately 
jumps out to me to say why MOR is 40 mins and the other is 17 mins.
   
   Comparing the two joins, 
   * MOR join has shuffle bytes: 207.6 GB and 50.3 GB.  
   * COW has two joins (first for determining list of files, second is the 
actual join with filter on the file list)
       * 1st join has shuflfe bytes: 10.3G and 7.0 G (file projection 
calcluation)
       * 2nd join has 156.9 G and 109.3 G (final join after file filter)
   
   So just based on that, I dont see huge difference here.
   
   Also I notice that the data for the two runs may be different, you have 785 
million and 499 million rows for COW, and then 261 million and 498 million for 
MOR.
   
   I am not a huge expert on Spark UI, but is there some where you can see how 
long each stage takes?  Hope you dont have to re-run both jobs but can just get 
it from the History.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho commented on issue #7431: iceberg mor table execute merge very very slow

Reply via email to