The current scripts use rsync to minimize the amount of data actually
being copied.
I've had a brief look and found only 1 implementation which is GPL and
abandoned
http://sourceforge.net/projects/jarsync.
Personally I still think the size of the transfer is important (as for
most use cases not much is actually changed every hour).. but thats just
me.. your case may be different than mine.
regards
Ian
Noble Paul നോബിള് नोब्ळ् wrote:
hi ,
The current replication strategy in solr involves shell scripts . The
following are the drawbacks
* It does not work with windows
* Replication works as a separate piece not integrated with solr.
* Cannot control replication from solr admin/JMX
* Each operation requires manual telnet to the host
Doing the replication within java code has the following advantages
* Platform independence
* Manual steps can be completely eliminated. Everything can be driven
from solrconfig.xml .
** Just put in the url of the master in the slaves that should be good
enough to enable replication. Other things like frequency of
snapshoot/snappull can also be configured
* Start/stop can be triggered from solr/admin or JMX
* Can get the status/progress while replication is going on
* No need to have a login into the machine
The implementation can be done as two components
* A SolrEventListener which does a snapshoot . Same as done by the script
* A ReplicationHandler which can act as a server to dish out the index
snapshots (in the master)
** In the slave the same handler can poll at regular intervals and if
there is a new snapshot fetch the index over http (it can use
solrj+BinaryReponseWriter)
* The same Handler can do a snap install
* The Handler may expose all the operations over a REST interface or JMX
* It may also show the current state of the master index through the console
What do you think?