Re: [PR] Core: Support Distributed Scan For Partitions Metadata Table [iceberg]

via GitHub Mon, 25 Aug 2025 10:32:23 -0700


RussellSpitzer commented on PR #13903:
URL: https://github.com/apache/iceberg/pull/13903#issuecomment-3221135831


   I think this is a great idea, but this is a relatively heavy approach to 
implementation. I think we really don't need to do much more than swapping the 
PartitionsTable implementation with the "ManifestEntries" table implementation 
(+ a group and distinct?)
   
   
   I also think this complexity is probably not needed
   ```
   New Planning Modes: Added LOCAL, DISTRIBUTED, and AUTO modes via 
METADATA_PLANNING_MODE table property
   Auto-switching: Automatically uses distributed scanning when manifest count 
exceeds configurable threshold (default: 10)
   Enhanced PartitionsTable: Implements DistributedPartitionsScan for parallel 
manifest processing
   Comprehensive Testing: Added tests for core functionality and Spark 
integration (v3.5, v4.0)
   Backward Compatibility: Existing behavior preserved with AUTO mode as default
   ```
   
   We can probably just swap out the implementation and be done with it. Almost 
everyone I know who relies on the "partitions" table actually does a "SELECT * 
FROM FILES/ENTRIES Group by Partition" anyway


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Core: Support Distributed Scan For Partitions Metadata Table [iceberg]

Reply via email to