szehon-ho commented on code in PR #6698: URL: https://github.com/apache/iceberg/pull/6698#discussion_r1131801430
########## hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java: ########## @@ -53,12 +70,14 @@ public class CachedClientPool implements ClientPool<IMetaStoreClient, TException properties, CatalogProperties.CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS, CatalogProperties.CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS_DEFAULT); + this.keySuppliers = Review Comment: While I like the keySupplier idea, wouldn't it be much simpler if we just use a static map of suppliers, like: ``` static final Map<String, Supplier> keySuppliers = ImmutableMap.of( "uri", () -> { try { return UserGroupInformation.getCurrentUser(); } catch (IOException e) { throw new UncheckedIOException(e); } }), "user_name" -> { () -> { try { String userName = UserGroupInformation.getCurrentUser().getUserName(); return UserNameElement.of(userName); } catch (IOException e) { throw new UncheckedIOException(e); } }); } ``` (as i dont see any need to generate this on the fly.) And then here in CachedClientPool CTOR, we can just use key to use the conf to make a key? ########## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ########## @@ -119,6 +119,26 @@ private CatalogProperties() {} "client.pool.cache.eviction-interval-ms"; public static final long CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS_DEFAULT = TimeUnit.MINUTES.toMillis(5); + /** + * A comma separated list of elements that are used to compose the key of the client pool cache. + * + * <p>The following elements are supported: + * + * <ul> + * <li>URI - as specified by {@link CatalogProperties#URI}. URI will be the only element when Review Comment: Actually , I am now thinking I don't see any use-case where you would turn off any of these from the key. What do you think? From user point of view, probably simpler is better and we should have reasonable defaults. Maybe we could have a configurable key but just limit it to to add additional hive conf if we missed something, as a safety valve. In any case, without dynamic loading, the user cant add additional code suppliers like ugi, and thus is limited to just setting config values here. cc @pvary @RussellSpitzer @flyrain for thoughts. ########## hive-metastore/src/main/java/org/apache/iceberg/hive/CachedClientPool.java: ########## @@ -87,4 +106,125 @@ public <R> R run(Action<R, IMetaStoreClient, TException> action, boolean retry) throws TException, InterruptedException { return clientPool().run(action, retry); } + + @VisibleForTesting + static Key toKey(List<Supplier<Object>> suppliers) { + return Key.of(suppliers.stream().map(Supplier::get).collect(Collectors.toList())); + } + + @VisibleForTesting + static List<Supplier<Object>> extractKeySuppliers(String cacheKeys, Configuration conf) { + URIElement uri = URIElement.of(conf.get(HiveConf.ConfVars.METASTOREURIS.varname, "")); + if (cacheKeys == null || cacheKeys.isEmpty()) { + return Collections.singletonList(() -> uri); + } + + // generate key elements in a certain order, so that the Key instances are comparable + Set<KeyElementType> types = Sets.newTreeSet(Comparator.comparingInt(Enum::ordinal)); + Map<String, String> confElements = Maps.newTreeMap(); + for (String element : cacheKeys.split(",", -1)) { + String trimmed = element.trim(); + if (trimmed.toLowerCase(Locale.ROOT).startsWith(CONF_ELEMENT_PREFIX)) { + String key = trimmed.substring(CONF_ELEMENT_PREFIX.length()); + ValidationException.check( + !confElements.containsKey(key), "Conf key element %s already specified", key); + confElements.put(key, conf.get(key)); + } else { + KeyElementType type = KeyElementType.valueOf(trimmed.toUpperCase()); + switch (type) { + case URI: + case UGI: + case USER_NAME: + ValidationException.check( + types.add(type), "%s key element already specified", type.name()); + break; + default: + throw new ValidationException("Unknown key element %s", trimmed); + } + } + } + ImmutableList.Builder<Supplier<Object>> suppliers = ImmutableList.builder(); + for (KeyElementType type : types) { + switch (type) { + case URI: + suppliers.add(() -> uri); + break; + case UGI: + suppliers.add( + () -> { + try { + return UserGroupInformation.getCurrentUser(); + } catch (IOException e) { + throw new UncheckedIOException(e); + } + }); + break; + case USER_NAME: + suppliers.add( + () -> { + try { + String userName = UserGroupInformation.getCurrentUser().getUserName(); + return UserNameElement.of(userName); + } catch (IOException e) { + throw new UncheckedIOException(e); + } + }); + break; + default: + throw new RuntimeException("Unexpected key element " + type.name()); + } + } + for (String key : confElements.keySet()) { + ConfElement element = ConfElement.of(key, confElements.get(key)); + suppliers.add(() -> element); + } + return suppliers.build(); + } + + @Value.Immutable Review Comment: I'm strugling to see the value of adding these immutable value, although I may be missing something. Why not just use String? ########## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ########## @@ -119,6 +119,26 @@ private CatalogProperties() {} "client.pool.cache.eviction-interval-ms"; public static final long CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS_DEFAULT = TimeUnit.MINUTES.toMillis(5); + /** + * A comma separated list of elements that are used to compose the key of the client pool cache. + * + * <p>The following elements are supported: + * + * <ul> + * <li>URI - as specified by {@link CatalogProperties#URI}. URI will be the only element when Review Comment: I think URI should be removed here as it shouldn't be configurable, as it is always there. Maybe we could mention in the javadoc that uri is always used , no matter what configurable keys the user passes in? Something like: ``` A comma separated list of elements used, in addition to the hive metastore uri, to compose the key of the client pool cache. ``` ########## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ########## @@ -119,6 +119,26 @@ private CatalogProperties() {} "client.pool.cache.eviction-interval-ms"; public static final long CLIENT_POOL_CACHE_EVICTION_INTERVAL_MS_DEFAULT = TimeUnit.MINUTES.toMillis(5); + /** + * A comma separated list of elements that are used to compose the key of the client pool cache. + * + * <p>The following elements are supported: + * + * <ul> + * <li>URI - as specified by {@link CatalogProperties#URI}. URI will be the only element when + * this property is not set. + * <li>UGI - the Hadoop UserGroupInformation instance that represents the current user using the + * cache. + * <li>USER_NAME - similar to UGI but only includes the user's name determined by + * UserGroupInformation#getUserName. + * <li>CONF - name of an arbitrary configuration. The value of the configuration will be Review Comment: Can we also add a example what the user can pass in here? Also, what do you think about make the list here show lower-case, to match what you wrote (ie, there is CONF but then you mention conf.a.b.c) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org