kaushikranjan commented on issue #12704:
URL: https://github.com/apache/iceberg/issues/12704#issuecomment-2779488001

   FYI - We have been facing the same issue in our cluster as well.
   We have a iceberg table with schema
   
   CREATE TABLE iceberg.user (
      customer_id VARCHAR(100) NOT NULL,
      id VARCHAR(100) NOT NULL,
      created_on TIMESTAMP(6) NOT NULL,
      updated_on TIMESTAMP(6) NOT NULL
   )
   WITH (
       format = 'PARQUET',
       format_version = 2,
       partitioning = ARRAY['bucket(customer_id, 20)'],
       sorted_by = ARRAY['id'],
   );
   
   customer_id and id are both guid values and unique. 
   
   Here is the data distribution, which is fairly even across all partitions
   <img width="558" alt="Image" 
src="https://github.com/user-attachments/assets/9f9e312e-335f-41fa-a3f0-d0bca3f45a24";
 />
   
   When running compaction, we are also facing the same issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to