: The "solr" on the rsync command line is just a label which is defined in
: rsyncd.conf on the master.  rsyncd.conf is created on the fly by the script
: rsyncd-start:
        ...
: This label is then mapped to the path defined in $data_dir.

Ah... right, i forgot about that.

: > Why does it need to start an rsyncd in the master in a different port
: > for each ap, is it not enough to call rsync on master:path?

one of the reasons for this appraoch is to make it easier to run solr in a
somewhat self contained setup .. you don't have to rely on an "external"
(to the Solr install) instance of rsyncd running rooted at base of the
filesystem.  the other nice thing with having seperate rsyncd for each
solr instance is that you can shutoff all replication with a single
command on a master solr port (without disabling other solr masters
running on the same machine, or breaking other non-solr uses of rsync on
that machine)

this can be handy when you want to do a upgrade to a solr tier without any
down time:
  1) turn of the master's rsync port,
  2) disable snappuller on all of the slaves
  3) shutdown and upgrade the master solr port
  4) rebuild the index on the master as needed
  5) run queries against the master to test things are working well.
  6) start the master's rsyncd port
  7) take half of your slaves out of rotation from your load balancer
  8) shutdown and upgrade the slaves that are out of rotation
  9) enable snappulling on the slaves that are out of rotation
 10) swap which slaves are in/out of rotation on your load balancer
 11) repeat steps 8 and 9
 12) add all slaves back into rotation on your load balancer.

...if you had a sincel rsync port for the entire machine, then this
wouldn't work very cleanly if the machine you were using as the "master"
was hosting more then solr index (or any other apps using rsync)


-Hoss

Reply via email to