Re: [I] Compaction results in Out Of Memory for >=million records [iceberg]

via GitHub Fri, 04 Apr 2025 11:35:18 -0700


kaushikranjan commented on issue #12704:
URL: https://github.com/apache/iceberg/issues/12704#issuecomment-2779488001


   FYI - We have been facing the same issue in our cluster as well.
   We have a iceberg table with schema
   
   CREATE TABLE iceberg.user (
      customer_id VARCHAR(100) NOT NULL,
      id VARCHAR(100) NOT NULL,
      created_on TIMESTAMP(6) NOT NULL,
      updated_on TIMESTAMP(6) NOT NULL
   )
   WITH (
       format = 'PARQUET',
       format_version = 2,
       partitioning = ARRAY['bucket(customer_id, 20)'],
       sorted_by = ARRAY['id'],
   );
   
   customer_id and id are both guid values and unique. 
   
   Here is the data distribution, which is fairly even across all partitions
   <img width="558" alt="Image" 
src="https://github.com/user-attachments/assets/9f9e312e-335f-41fa-a3f0-d0bca3f45a24";
 />
   
   When running compaction, we are also facing the same issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Compaction results in Out Of Memory for >=million records [iceberg]

Reply via email to