RE: Replication issues with multiple Slaves

Jaeger, Jay - DOT Wed, 26 Oct 2011 05:46:29 -0700

Thanks for that information.  It was most useful.  

Does anyone know:  when this happens does the slave continue using its old 
index, and then try again at the next time interval?  (I sure hope so).


JRJ

-----Original Message-----
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Tuesday, October 25, 2011 3:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Replication issues with multiple Slaves


> 1) Hmm, maybe, didn't notice that... but I'd be very confused why it works
> occasionally, and manual replication (through Solr Admin) always works ok
> in that case?
> 2) This was my initial thought, it was happening on one core (multiple
> commits while replication in progress), but I noticed it happening on
> another core (the one mentioned below) which only had 1 commit and a single
> generation (11 > 12) change to replicate.
> 
> 
> I too hoped and presumed that the Master is being Locked while replication
> is copying files... can anyone confirm this? We are using the native Lock
> type on a Windows/Tomcat server.

Replication does not lock the index from being written to.

> 
> Is anyone aware of any reason why the replication skips files, or fails to
> copy/find files other than because of presumably a commit or optimize
> re-chunking the segments and deleting them on the Master?

Slaves receive a list of files to download. Files further on the list may 
disappear before it gets a change to download them. By keeping older commits 
we were able to work around this issue.

> 
> -----Original Message-----
> From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
> Sent: 25 October 2011 20:48
> To: solr-user@lucene.apache.org
> Subject: RE: Replication issues with multiple Slaves
> 
> I noted that in these messages the left hand side is lower case collection,
> but the right hand side is upper case Collection.  Assuming you did a
> cut/paste, could you have a core name mismatch between a master and a slave
> somehow?
> 
> Otherwise (shudder):  could you be doing a commit while the replication is
> in progress, causing files to shift about on it?  I'd have expected
> (perhaps naively) solr to have some sort of lock to prevent such a
> problem.  But if there is no internal lock, that would be a serious matter
> (and could happen to us, too, down the road).
> 
> JRJ
> 
> -----Original Message-----
> From: Rob Nicholls [mailto:robst...@hotmail.com]
> Sent: Tuesday, October 25, 2011 10:32 AM
> To: solr-user@lucene.apache.org
> Subject: Replication issues with multiple Slaves
> 
> 
> Hey guys,
> 
> We have a Master (1 server) and 2 Slaves (2 servers) setup and running
> replication across multiple cores.
> 
> However, the replication appears to behave sporadically and often fails
> when left to replicate automatically via poll. More often than not a
> replicate will fail after the slave has finished pulling down the segment
> files, because it cannot find a particular file, giving errors such as:
> 
> Oct 25, 2011 10:00:17 AM org.apache.solr.handler.SnapPuller copyAFile
> SEVERE: Unable to move index file from:
> D:\web\solr\collection\data\index.20111025100000\_3u.tii to:
> D:\web\solr\Collection\data\index\_3u.tiiTrying to do a copy
> 
> SEVERE: Unable to copy index file from:
> D:\web\solr\collection\data\index.20111025100000\_3s.fdt to:
> D:\web\solr\Collection\data\index\_3s.fdt
> java.io.FileNotFoundException:
> D:\web\solr\collection\data\index.20111025100000\_3s.fdt (The system cannot
> find the file specified)
>     at java.io.FileInputStream.open(Native Method)
>     at java.io.FileInputStream.<init>(Unknown Source)
>     at org.apache.solr.common.util.FileUtils.copyFile(FileUtils.java:47)
>     at org.apache.solr.handler.SnapPuller.copyAFile(SnapPuller.java:585)
>     at
> org.apache.solr.handler.SnapPuller.copyIndexFiles(SnapPuller.java:621)
>     at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:317)
>     at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:
> 2 67)
>     at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>     at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown
> Source) at java.util.concurrent.FutureTask.runAndReset(Unknown Source) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access
> $ 101(Unknown Source)
>     at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPer
> i odic(Unknown Source)
>     at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Un
> k nown Source)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>     at java.lang.Thread.run(Unknown Source)
> 
> For these files, I checked the master, and they did indeed exist.
> 
> Both slave machines are configured the same, with the same replication
> settings and a 60 minutes poll interval.
> 
> Is it perhaps because both slave machines are trying to pull down files at
> the same time? (and the other has a lock on the file, thus it gets skipped
> maybe?)
> 
> Note: If I manually force replication on each slave, one at a time, the
> replication always seems to work fine.
> 
> 
> 
> Is there any obvious explanation or oddities I should be aware of that may
> cause this?
> 
> Thanks,
> Rob

RE: Replication issues with multiple Slaves

Reply via email to