[GitHub] [iceberg] rdblue commented on pull request #6369: Increase Partition Start Id to 10000

GitBox Wed, 07 Dec 2022 17:03:44 -0800


rdblue commented on PR #6369:
URL: https://github.com/apache/iceberg/pull/6369#issuecomment-1341825580


   Looks like @RussellSpitzer, @szehon-ho, and @aokolnychyi are looking at this 
and have noted the issues with v1 tables.
   
   I think that this is risky because not all v1 readers will use partition 
field IDs, but we do write them into partition specs now. Currently, we are 
careful that those IDs are always the same, but this change would cause them to 
differ. It may be safe, but I'd test very thoroughly and possibly put this 
behind a flag.
   
   I'd also like to understand why this is needed. Partition field IDs are 
stored in manifest files, not data files. Partition field IDs should generally 
not mix with data field IDs from the Iceberg schema.
   
   The only case I can think of right now is projecting the `_partition` 
metadata field when reading a table... but in that case I think there needs to 
be a better solution. Running into a collision at 10,000 fields is still 
possible with this PR. We should just assign new field IDs to the `_partition` 
metadata fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #6369: Increase Partition Start Id to 10000

Reply via email to