[
https://issues.apache.org/jira/browse/HADOOP-19447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938415#comment-17938415
]
ASF GitHub Bot commented on HADOOP-19447:
-----------------------------------------
yangjiandan commented on code in PR #7527:
URL: https://github.com/apache/hadoop/pull/7527#discussion_r2013342166
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java:
##########
@@ -586,17 +592,62 @@ InetAddress getByName(String hostname) throws
UnknownHostException {
return hostResolver.getByName(hostname);
}
}
-
+
interface HostResolver {
- InetAddress getByName(String host) throws UnknownHostException;
+ InetAddress getByName(String host) throws UnknownHostException;
+ }
+
+ static abstract class CacheableHostResolver implements HostResolver {
+ private volatile LoadingCache<String, InetAddress> cache;
+
+ CacheableHostResolver(int expiryIntervalSecs) {
+ if (expiryIntervalSecs > 0) {
+ cache = CacheBuilder.newBuilder()
+ .expireAfterWrite(expiryIntervalSecs, TimeUnit.SECONDS)
+ .build(new CacheLoader<String, InetAddress>() {
+ @Override
+ public InetAddress load(String key) throws Exception {
+ return resolve(key);
+ }
+ });
+ }
+ }
+ protected abstract InetAddress resolve(String host) throws
UnknownHostException;
+
+ @Override
+ public InetAddress getByName(String host) throws UnknownHostException {
+ if (cache != null) {
+ try {
+ return cache.get(host);
+ } catch (Exception e) {
+ Throwable cause = e.getCause();
+ if (cause instanceof UnknownHostException) {
+ throw (UnknownHostException) cause;
+ }
+ throw new UnknownHostException("Error resolving host " + host +
+ ": " + cause.getMessage());
Review Comment:
You are right!
I'll fix this potential error.
> Add Caching Mechanism to HostResolver to Avoid Redundant Hostname Resolutions
> -----------------------------------------------------------------------------
>
> Key: HADOOP-19447
> URL: https://issues.apache.org/jira/browse/HADOOP-19447
> Project: Hadoop Common
> Issue Type: New Feature
> Components: common, yarn
> Reporter: Jiandan Yang
> Priority: Major
> Labels: pull-request-available
>
> *Background:*
>
> Currently, the two implementations of
> org.apache.hadoop.security.SecurityUtil.HostResolver, *StandardHostResolver
> and QualifiedHostResolver* in Hadoop performs hostname resolution each time
> it is called. *Each heartbeat between the AM and RM causes the RM to invoke
> the* HostResolver#getByName {*}method once{*}. In large-scale clusters
> running numerous applications, this results in *a high frequency of redundant
> hostname resolutions.*
>
> *Proposal:*
>
> Introduce a caching mechanism in HostResolver to store resolved hostnames for
> a configurable duration. This would:
> •Reduce redundant DNS queries.
> •Improve performance for frequently used hostnames.
> •Allow configuration options for cache size and TTL (Time-to-Live).
>
> *Suggested Implementation:*
> 1.{*}Leverage Existing CachedResolver{*}:
> The NodesListManager.CachedResolver class in Hadoop already implements a
> caching mechanism for hostname resolution. Instead of introducing an entirely
> new solution, we propose *extracting the caching logic from*
> NodesListManager.CachedResolver {*}into a separate reusable utility class{*}.
> 2.{*}Create a Shared Caching Utility{*}:
> •Extract the caching logic from NodesListManager.CachedResolver.
> •Implement a new class, e.g., HostnameCache, and place it in the Hadoop
> Common module to ensure it can be used across different components.
> 3.{*}Integrate{*} HostnameCache with *HostResolver &QualifiedHostResolver*:
> •Modify HostResolver to use HostnameCache for hostname lookups.
> •Update NodesListManager.CachedResolver to use HostnameCache instead of its
> own internal cache.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]