: can someone let me know how to configure DIH in a cloud environment, should
: it point to one specific server or to the load balancer for distributing dih
: on all the servers. 

: One problem with the load balancer approach is that there is no good way to
: tell whether a dih is already running in the cloud. 

As far as i know there is no distributed support yet for running DIH -- 
you should be able to run DIH fullimport/deletaimport commands on a single 
node, and have all the documents istributed properly to the individual 
nodes/shards, but there is no automatic mechanism for ensuring that only 
one node is running DIH, or for querying the status of DIH from any node 
-- so you would have to keep track of which node you're using to run DIH 
jobs on, and if it goes down, pick a new node.

i suspect it would be relatively striaght forward to leverage ZooKeeper to 
asign a "DIH-leader" and ensure only one node was running DIH at a time -- 
the question would be what to do about failover, and state (most likely 
we'd need to also keep the last run and variable info in ZK as well?) ... 
if you'd like to help contribute this type of functionality i'm sure it 
would be appreciated.



-Hoss

Reply via email to