Thanks for that information. It was most useful. Does anyone know: when this happens does the slave continue using its old index, and then try again at the next time interval? (I sure hope so).
JRJ -----Original Message----- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Tuesday, October 25, 2011 3:15 PM To: solr-user@lucene.apache.org Subject: Re: Replication issues with multiple Slaves > 1) Hmm, maybe, didn't notice that... but I'd be very confused why it works > occasionally, and manual replication (through Solr Admin) always works ok > in that case? > 2) This was my initial thought, it was happening on one core (multiple > commits while replication in progress), but I noticed it happening on > another core (the one mentioned below) which only had 1 commit and a single > generation (11 > 12) change to replicate. > > > I too hoped and presumed that the Master is being Locked while replication > is copying files... can anyone confirm this? We are using the native Lock > type on a Windows/Tomcat server. Replication does not lock the index from being written to. > > Is anyone aware of any reason why the replication skips files, or fails to > copy/find files other than because of presumably a commit or optimize > re-chunking the segments and deleting them on the Master? Slaves receive a list of files to download. Files further on the list may disappear before it gets a change to download them. By keeping older commits we were able to work around this issue. > > -----Original Message----- > From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] > Sent: 25 October 2011 20:48 > To: solr-user@lucene.apache.org > Subject: RE: Replication issues with multiple Slaves > > I noted that in these messages the left hand side is lower case collection, > but the right hand side is upper case Collection. Assuming you did a > cut/paste, could you have a core name mismatch between a master and a slave > somehow? > > Otherwise (shudder): could you be doing a commit while the replication is > in progress, causing files to shift about on it? I'd have expected > (perhaps naively) solr to have some sort of lock to prevent such a > problem. But if there is no internal lock, that would be a serious matter > (and could happen to us, too, down the road). > > JRJ > > -----Original Message----- > From: Rob Nicholls [mailto:robst...@hotmail.com] > Sent: Tuesday, October 25, 2011 10:32 AM > To: solr-user@lucene.apache.org > Subject: Replication issues with multiple Slaves > > > Hey guys, > > We have a Master (1 server) and 2 Slaves (2 servers) setup and running > replication across multiple cores. > > However, the replication appears to behave sporadically and often fails > when left to replicate automatically via poll. More often than not a > replicate will fail after the slave has finished pulling down the segment > files, because it cannot find a particular file, giving errors such as: > > Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile > SEVERE: Unable to move index file from: > D:\web\solr\collection\data\index.20111025100000\_3u.tii to: > D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy > > SEVERE: Unable to copy index file from: > D:\web\solr\collection\data\index.20111025100000\_3s.fdt to: > D:\web\solr\Collection\data\index\_3s.fdt > java.io.FileNotFoundException: > D:\web\solr\collection\data\index.20111025100000\_3s.fdt (The system cannot > find the file specified) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.<init>(Unknown Source) > at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47) > at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585) > at > org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621) > at > org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317) > at > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java: > 2 67) > at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown > Source) at java.util.concurrent.FutureTask.runAndReset(Unknown Source) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access > $ 101(Unknown Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPer > i odic(Unknown Source) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Un > k nown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > > For these files, I checked the master, and they did indeed exist. > > Both slave machines are configured the same, with the same replication > settings and a 60 minutes poll interval. > > Is it perhaps because both slave machines are trying to pull down files at > the same time? (and the other has a lock on the file, thus it gets skipped > maybe?) > > Note: If I manually force replication on each slave, one at a time, the > replication always seems to work fine. > > > > Is there any obvious explanation or oddities I should be aware of that may > cause this? > > Thanks, > Rob