ookumuso commented on PR #11112:
URL: https://github.com/apache/iceberg/pull/11112#issuecomment-2372409237

   > > @danielcweeks let me know what you think about this. One alternative is 
to maybe provide both as in having S3LocationProvider as is and a base2 option 
for the ObjectStoreLocationProvider to keep the option for partition values 
along with directories
   > 
   > @ookumuso I'm still of the opinion that we should just incorporate this 
into the existing object store location provider. Ultimately, we can change the 
hashing behavior as the existing behavior is based on earlier discussions with 
S3, so replacing it or providing alternative options is fine.
   > 
   > I do think partition values in the path still needs to be an option.
   > 
   > I think we also need to consider adding a path separator to help with some 
of the maintenance routines. For example 
`s3://bucket/<prefix>/<db>/<table>/data/00101001/0010100111011...`. The `/` in 
the hash portion allows for `2^8` prefixes to allow listing under those paths 
for clients that list. If we omit that, we get `2^24` prefixes that need to be 
listed under the data directory, which makes maintenance difficult.
   
   @danielcweeks Understood thanks for the feedback Daniel. Looks like it is 
not going to be feasible for us to remove `/` delimeter. With this, I agree 
that it doesn't make sense to offer a separate provider. I think it would be 
better if we update the existing provider to use base2 directly instead of opt 
in which can simplify the onboarding process for users as they can see the 
auto-scaling improvement right away. @jackye1995 Are you also okay with 
switching ObjectStoreLocationProvider to use base2 as a default as well? I can 
potentially cut down 24 bits to 20 or even 16 to reduce the length a bit.
   
   For partition values, I can probably send a separate follow up as an option 
to remove them from the file name with a new table property so callers can 
decide but planning to exclude that for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to