ookumuso commented on PR #11112: URL: https://github.com/apache/iceberg/pull/11112#issuecomment-2372409237
> > @danielcweeks let me know what you think about this. One alternative is to maybe provide both as in having S3LocationProvider as is and a base2 option for the ObjectStoreLocationProvider to keep the option for partition values along with directories > > @ookumuso I'm still of the opinion that we should just incorporate this into the existing object store location provider. Ultimately, we can change the hashing behavior as the existing behavior is based on earlier discussions with S3, so replacing it or providing alternative options is fine. > > I do think partition values in the path still needs to be an option. > > I think we also need to consider adding a path separator to help with some of the maintenance routines. For example `s3://bucket/<prefix>/<db>/<table>/data/00101001/0010100111011...`. The `/` in the hash portion allows for `2^8` prefixes to allow listing under those paths for clients that list. If we omit that, we get `2^24` prefixes that need to be listed under the data directory, which makes maintenance difficult. @danielcweeks Understood thanks for the feedback Daniel. Looks like it is not going to be feasible for us to remove `/` delimeter. With this, I agree that it doesn't make sense to offer a separate provider. I think it would be better if we update the existing provider to use base2 directly instead of opt in which can simplify the onboarding process for users as they can see the auto-scaling improvement right away. @jackye1995 Are you also okay with switching ObjectStoreLocationProvider to use base2 as a default as well? I can potentially cut down 24 bits to 20 or even 16 to reduce the length a bit. For partition values, I can probably send a separate follow up as an option to remove them from the file name with a new table property so callers can decide but planning to exclude that for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org