[ 
https://issues.apache.org/jira/browse/HADOOP-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046809#comment-13046809
 ] 

E. Sammer commented on HADOOP-7359:
-----------------------------------

I like the idea of this, but I would not necessarily attach it to the notion of 
files, explicitly. I would propose a ClusterTopology SPI-style interface and a 
ClusterTopologyListener interface for those interested in topology changes. 
Ideally, all clients (either internal to Hadoop daemons or external tools) 
would ask implementations of ClusterTopology for the list of hosts.

Off the top of my head API:

ClusterTopology <<interface>>
  getNodes() : Set<Node>
  refresh()
  getListeners() : Set<ClusterTopologyListener>
  addListener(ClusterTopologyListener) : boolean (wasAdded)
  removeListener(ClusterTopologyListener) : boolean (wasRemoved)

ClusterTopologyListener <<interface>>
  onTopologyChange(ClusterTopology)

And *then* have a single class that implements ClusterTopology. Configure the 
class two ways (no need to have an inheritance hierarchy).

HostFileClusterTopology <<class>>
  /*
     A private member with a base file name. The implementation automatically
     looks for baseFileName + .{include,exclude}
   */
  baseFileName : File

This, to me, seems like it would support the current file based membership but 
also things like an RDBMS or ZK. In the case of files and an RDBMS (and other 
non-event based systems) listeners wouldn't be notified of changes until a 
refresh() occurred. Alternatively, implementations could include a poller which 
automatically called refresh() when a change is detected or something like 
that. This is also easy to mock out and test.

> Pluggable interface for cluster membership
> ------------------------------------------
>
>                 Key: HADOOP-7359
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7359
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Travis Crawford
>         Attachments: HADOOP-7359.diff
>
>
> Currently Hadoop uses local files to determine cluster membership. With HDFS 
> for example, dfs.hosts and dfs.hosts.exclude are used.
> To enable tighter integrations cluster membership should be an interface, 
> with the current file-based functionality provided as the default 
> implementation. The common case would be no functional change, however, sites 
> could plug an alternative implementation in, such as pulling the machine 
> lists from a machine database.
> DETAILS:
> Two machine lists, includes and excludes, are used to define cluster 
> membership and state. HostsFileReader currently handles reading these lists 
> from files, who's names are passed in by FSNamesystem for HDFS and JobTracker 
> for MR.
> The proposed change is adding a HostsReader interface to common, and changing 
> HostsFileReader to an abstract class that functions the same as today.
> Two new classes, DFSHostsFileReader and MRHostsFileReader, extend 
> HostsFileReader and simply pass the appropriate file names in. These new 
> classes are needed because config key names live outside common.
> Two new conf keys, defaulting to the file-based readers, would be added to 
> choose a different hosts reader: dfs.namenode.hosts.reader.class 
> mapreduce.jobtracker.hosts.reader.class
> Comments/suggestions? I have most of this written already but would love some 
> feedback on the general idea before posting the diff.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to