[GitHub] [iceberg] holdenk opened a new issue, #8010: Default Range Partioning w/Sorted Table causes performance issues

via GitHub Fri, 07 Jul 2023 09:22:50 -0700


holdenk opened a new issue, #8010:
URL: https://github.com/apache/iceberg/issues/8010


   ### Apache Iceberg version
   
   1.1.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   After upgrading to 1.1 w/Spark 3.3 we noticed that writes to a table with a 
sort order specified resulted in a range partioning of w/ only two partitions 
during the write phase (for an output of 22TB).
   
   As a work around we can set write distribution mode to none and do our own 
sort (like before), but ideally we could do something "smarter" inside of 
Iceberg (or Spark?) and trigger a sort with some "extra" bits so that we don't 
end up with huge stuck partitions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] holdenk opened a new issue, #8010: Default Range Partioning w/Sorted Table causes performance issues

Reply via email to