Are you frequently adding and deleting documents and committing those mutations? Then it might try to download a file that doesnt exist anymore. If that is the case try increasing :
<str name="maxCommitsToKeep"></str> > I noted that in these messages the left hand side is lower case collection, > but the right hand side is upper case Collection. Assuming you did a > cut/paste, could you have a core name mismatch between a master and a > slave somehow? > > Otherwise (shudder): could you be doing a commit while the replication is > in progress, causing files to shift about on it? I'd have expected > (perhaps naively) solr to have some sort of lock to prevent such a > problem. But if there is no internal lock, that would be a serious matter > (and could happen to us, too, down the road). > > JRJ > > -----Original Message----- > From: Rob Nicholls [mailto:robst...@hotmail.com] > Sent: Tuesday, October 25, 2011 10:32 AM > To: solr-user@lucene.apache.org > Subject: Replication issues with multiple Slaves > > > Hey guys, > > We have a Master (1 server) and 2 Slaves (2 servers) setup and running > replication across multiple cores. > > However, the replication appears to behave sporadically and often fails > when left to replicate automatically via poll. More often than not a > replicate will fail after the slave has finished pulling down the segment > files, because it cannot find a particular file, giving errors such as: > > Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile > SEVERE: Unable to move index file from: > D:\web\solr\collection\data\index.20111025100000\_3u.tii to: > D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy > > SEVERE: Unable to copy index file from: > D:\web\solr\collection\data\index.20111025100000\_3s.fdt to: > D:\web\solr\Collection\data\index\_3s.fdt java.io.FileNotFoundException: > D:\web\solr\collection\data\index.20111025100000\_3s.fdt (The system > cannot find the file specified) at java.io.FileInputStream.open(Native > Method) > at java.io.FileInputStream.<init>(Unknown Source) > at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47) > at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585) > at > org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621) at > org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317) > at > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java > :267) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at > java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at > java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source) at > java.util.concurrent.FutureTask.runAndReset(Unknown Source) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.acces > s$101(Unknown Source) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPe > riodic(Unknown Source) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(U > nknown Source) at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at > java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at > java.lang.Thread.run(Unknown Source) > > For these files, I checked the master, and they did indeed exist. > > Both slave machines are configured the same, with the same replication > settings and a 60 minutes poll interval. > > Is it perhaps because both slave machines are trying to pull down files at > the same time? (and the other has a lock on the file, thus it gets skipped > maybe?) > > Note: If I manually force replication on each slave, one at a time, the > replication always seems to work fine. > > > > Is there any obvious explanation or oddities I should be aware of that may > cause this? > > Thanks, > Rob