Are you frequently adding and deleting documents and committing those 
mutations? Then it might try to download a file that doesnt exist anymore. If 
that is the case try increasing :

<str name="maxCommitsToKeep"></str>

> I noted that in these messages the left hand side is lower case collection,
> but the right hand side is upper case Collection.  Assuming you did a
> cut/paste, could you have a core name mismatch between a master and a
> slave somehow?
> 
> Otherwise (shudder):  could you be doing a commit while the replication is
> in progress, causing files to shift about on it?  I'd have expected
> (perhaps naively) solr to have some sort of lock to prevent such a
> problem.  But if there is no internal lock, that would be a serious matter
> (and could happen to us, too, down the road).
> 
> JRJ
> 
> -----Original Message-----
> From: Rob Nicholls [mailto:robst...@hotmail.com]
> Sent: Tuesday, October 25, 2011 10:32 AM
> To: solr-user@lucene.apache.org
> Subject: Replication issues with multiple Slaves
> 
> 
> Hey guys,
> 
> We have a Master (1 server) and 2 Slaves (2 servers) setup and running
> replication across multiple cores.
> 
> However, the replication appears to behave sporadically and often fails
> when left to replicate automatically via poll. More often than not a
> replicate will fail after the slave has finished pulling down the segment
> files, because it cannot find a particular file, giving errors such as:
> 
> Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile
> SEVERE: Unable to move index file from:
> D:\web\solr\collection\data\index.20111025100000\_3u.tii to:
> D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy
> 
> SEVERE: Unable to copy index file from:
> D:\web\solr\collection\data\index.20111025100000\_3s.fdt to:
> D:\web\solr\Collection\data\index\_3s.fdt java.io.FileNotFoundException:
> D:\web\solr\collection\data\index.20111025100000\_3s.fdt (The system
> cannot find the file specified) at java.io.FileInputStream.open(Native
> Method)
>     at java.io.FileInputStream.<init>(Unknown Source)
>     at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47)
>     at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585)
>     at
> org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621) at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java
> :267) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at
> java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source) at
> java.util.concurrent.FutureTask.runAndReset(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces
> s$101(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPe
> riodic(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(U
> nknown Source) at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at
> java.lang.Thread.run(Unknown Source)
> 
> For these files, I checked the master, and they did indeed exist.
> 
> Both slave machines are configured the same, with the same replication
> settings and a 60 minutes poll interval.
> 
> Is it perhaps because both slave machines are trying to pull down files at
> the same time? (and the other has a lock on the file, thus it gets skipped
> maybe?)
> 
> Note: If I manually force replication on each slave, one at a time, the
> replication always seems to work fine.
> 
> 
> 
> Is there any obvious explanation or oddities I should be aware of that may
> cause this?
> 
> Thanks,
> Rob

Reply via email to