ravindrabajpai opened a new issue, #12431: URL: https://github.com/apache/iceberg/issues/12431
### Proposed Change I have recently started exploring iceberg table format, and I feel hidden partitioning is a very powerful feature. I would like to explore it a bit further to see if following case can be possible. Please ignore this proposal and point me to an alternate solution if there is any. I would like to explore and propose the following type of partitioning mechanism explained with example - For example a table with a specific partitioning structure that combines a "status" partition (active/inactive) with a date partition (business_date) for inactive records only. It solves my small file issue for status=active partition, where if I don't have business date as sub partition, then I can have a few big files right inside the status=active foloder. 1. Proposed/Expected Table Schema and Partitioning Strategy Let's assume we have a table with columns like: id (INT): Unique identifier status (STRING): "active" or "inactive" business_date (DATE): Date of the business event data (STRING): Some data Partitioning strategy will be: status: Top-level partition (active/inactive) business_date: Sub-partition only for status = 'inactive'. Creating the Iceberg Table - CREATE TABLE my_iceberg_table ( id INT, status STRING, business_date DATE, data STRING ) USING iceberg PARTITIONED BY (status, CASE WHEN status = 'inactive' THEN business_date ELSE null END); Here - PARTITIONED BY (status, CASE WHEN status = 'inactive' THEN business_date ELSE null END): This is the key part. It partitions first by status. The CASE statement creates the sub-partition: If status is 'inactive', it partitions by business_date. If status is 'active', it partitions by NULL. 3. Ingesting Records INSERT INTO my_iceberg_table VALUES (1, 'active', '2023-10-26', 'Active data 1'), (2, 'active', '2023-10-27', 'Active data 2'), (3, 'inactive', '2023-10-26', 'Inactive data 1'), (4, 'inactive', '2023-10-27', 'Inactive data 2'), (5, 'inactive', '2023-10-27', 'Inactive data 3'); SELECT * FROM my_iceberg_table WHERE status = 'active'; SELECT * FROM my_iceberg_table WHERE status = 'inactive' AND business_date = '2023-10-27'; ### Proposal document _No response_ ### Specifications - [x] Table - [ ] View - [ ] REST - [ ] Puffin - [ ] Encryption - [ ] Other -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org