On 2/4/08, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Feb 4, 2008 2:20 PM, Rachel McConnell <[EMAIL PROTECTED]> wrote: > > > If you are running snapshooter asynchronously, this would be the cause. > > > It's designed to be run from solr (via a postCommit or postOptimize > > > hook) at specific points where a consistent view of the index is > > > available. > > > > So our cron job might be running DURING an update, for example, and > > get duplicate values that way? > > Right. Duplicates are removed on a commit(), so if a snapshot is > being taken at any other time than right after a commit, those deletes > will not have been performed.
I've reviewed the wiki pages about snappuller (http://wiki.apache.org/solr/SolrCollectionDistributionScripts) and solrconfig.xml (http://wiki.apache.org/solr/SolrConfigXml) and it seems that the snappuller is intended to be used on the slave server. In our case, the slave servers do no updating and never commit; the master is the only one that commits. Is there a standard way for the just-committed, consistent index to be pushed from the master server out to the slaves? In fact I don't see how this is supposed to work in any environment where the master and slave Solr servers are on different physical machines. The postCommit handler should run after a commit, which only happens on the master server; yet it runs snappuller which should run on a slave. I am probably missing something here, is there any more documentation you can point me to? Rachel