The current scripts use rsync to minimize the amount of data actually being copied.

I've had a brief look and found only 1 implementation which is GPL and abandoned
http://sourceforge.net/projects/jarsync.

Personally I still think the size of the transfer is important (as for most use cases not much is actually changed every hour).. but thats just me.. your case may be different than mine.

regards
Ian


Noble Paul നോബിള്‍ नोब्ळ् wrote:
hi ,
The current replication strategy in solr involves shell scripts . The
following are the drawbacks
*  It does not work with windows
* Replication works as a separate piece not integrated with solr.
* Cannot control replication from solr admin/JMX
* Each operation requires manual telnet to the host

Doing the replication within java code has the following advantages
* Platform independence
* Manual steps can be completely eliminated. Everything can be driven
from solrconfig.xml .
** Just put in the url of the master in the slaves that should be good
enough to enable replication. Other things like frequency of
snapshoot/snappull can also be configured
* Start/stop can be triggered from solr/admin or JMX
* Can get the status/progress while replication is going on
* No need to have a login into the machine

The implementation can be done as two components
* A SolrEventListener which does a snapshoot . Same as done by the script
* A ReplicationHandler which can act as a server to dish out the index
snapshots (in the master)
** In the slave the same handler can poll at regular intervals and if
there is a new snapshot fetch the index over http (it can use
solrj+BinaryReponseWriter)
* The same Handler can do a snap install
* The Handler may expose all the operations over a REST interface or JMX
* It may also show the current state of the master index through the console

What do you think?


Reply via email to