Maduranga Kannangara wrote: > Permanent solution we found was to add: > > 1. flush() before closing the segment.gen file write (On Lucene). > Hmm ... but close does flush?
> 2. Remove the slave's segment.gen before replication > > > Point 1 elaborated: > > Lucene 2.4, org.apache.lucene.index.SegmentInfos.finishCommit(Directory dir) > method: > > Writing of segment.gen file was changed to: > > public final void prepareCommit(Directory dir) throws IOException { > . > . > . > > try { > IndexOutput genOutput = dir.createOutput(IndexFileNames.SEGMENTS_GEN); > try { > genOutput.writeInt(FORMAT_LOCKLESS); > genOutput.writeLong(generation); > genOutput.writeLong(generation); > } finally { > genOutput.flush(); // this is the simple change!!!!!!!!! > genOutput.close(); > } > } catch (Throwable t) { > // It's OK if we fail to write this file since it's > // used only as one of the retry fallbacks. > } > > } > > > I believe, if this makes sense, we should add this simple line in Lucene! :-) > > > However, since Java Replication in Solr 1.4, an application level process, > should have already solved this issue in another way as well. > Yet to test it. > > > Thanks > Madu > > > -----Original Message----- > From: Maduranga Kannangara > Sent: Monday, 16 November 2009 2:39 PM > To: solr-user@lucene.apache.org > Subject: RE: Segment file not found error - after replicating > > Yes, I too believed so.. > > The logic in earlier said method does the "gen number calculation" using > segment files available (genA) and using segment.gen file content (genB). > Which ever larger, would be the gen number used to look up for segment file. > > When the file is not properly replicated (due to that is not being written to > hard disk, or rsync ed) and segment gen number in the segment.gen file (genB) > is larger than the file based calculation (genA) we hit the pre-said issue. > > Cheers > Madu > > > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Monday, 16 November 2009 2:19 PM > To: solr-user@lucene.apache.org > Subject: Re: Segment file not found error - after replicating > > Thats odd - that file is normally not used - its a backup method to > figure out the current generation in case it cannot be determined with a > directory listing - its basically for NFS. > > Maduranga Kannangara wrote: > >> Just found out the root cause: >> >> * The segments.gen file does not get replicated to slave all the time. >> >> For some reason, this small (20bytes) file lives in memory and does not get >> updated to the master's hard disk. Therefore it is not obviously transferred >> to slaves. >> >> Solution was to shut down the master web app (must be a clean shut down!, >> not kill of Tomcat). Then do the replication. >> >> Also, if the timestamp/size (size won't change anyway!) is not changed, >> Rsync does not seem to copy over this file too. So enforcing in the >> replication scripts solved the problem. >> >> Thanks Otis and everyone for all your support! >> >> Madu >> >> >> -----Original Message----- >> From: Maduranga Kannangara >> Sent: Monday, 16 November 2009 12:37 PM >> To: solr-user@lucene.apache.org >> Subject: RE: Segment file not found error - after replicating >> >> Yes. We have tried Solr 1.4 and so far its been great success. >> >> Still I am investigating why Solr 1.3 gave an issue like before. >> >> Currently seems to me >> org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to >> figure out correct segment file name. (May be index replication issue -- >> leading to "not fully replicated".. but its so hard to believe as both >> master and slave are having 100% same data now!) >> >> Anyway.. will keep on trying till I find something useful.. and will let you >> know. >> >> >> Thanks >> Madu >> >> >> -----Original Message----- >> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >> Sent: Wednesday, 11 November 2009 10:03 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Segment file not found error - after replicating >> >> It sounds like your index is not being fully replicated. I can't tell why, >> but I can suggest you try the new Solr 1.4 replication. >> >> Otis >> -- >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> >> >> >> ----- Original Message ---- >> >> >>> From: Maduranga Kannangara <mkannang...@infomedia.com.au> >>> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> >>> Sent: Tue, November 10, 2009 5:42:44 PM >>> Subject: RE: Segment file not found error - after replicating >>> >>> Thanks Otis, >>> >>> I did the du -s for all three index directories as you said right after >>> replicating and when I find errors. >>> >>> All three gave me the exact same value. This time I found the error in a >>> rather >>> small index too (31Mb). >>> >>> BTW, if I copy the segment_x file to what Solr is looking for, and restart >>> the >>> Solr web-app from Tomcat manager, this resolves. But it's just a work >>> around, >>> never good enough for the production deployments. >>> >>> My next plan is to do a remote debug to see what exactly happening in the >>> code. >>> >>> Any other things I should looking at? >>> Any help is really appreciated on this matter. >>> >>> Thanks >>> Madu >>> >>> >>> -----Original Message----- >>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >>> Sent: Tuesday, 10 November 2009 1:14 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Segment file not found error - after replicating >>> >>> Madu, >>> >>> So are you saying that all slaves have the exact same index, and that index >>> is >>> exactly the same as the one on the master, yet only some of those slaves >>> exhibit >>> this error, while others do not? Mind listing index directories of 1) >>> master 2) >>> slave without errors, 3) slave with errors and doing: >>> du -s /path/to/index/on/master >>> du -s /path/to/index/on/slave/without/errors >>> du -s /path/to/index/on/slave/with/errors >>> >>> >>> Otis >>> -- >>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >>> >>> >>> >>> ----- Original Message ---- >>> >>> >>>> From: Maduranga Kannangara >>>> To: "solr-user@lucene.apache.org" >>>> Sent: Mon, November 9, 2009 7:47:04 PM >>>> Subject: RE: Segment file not found error - after replicating >>>> >>>> Thanks Otis! >>>> >>>> Yes, I checked the index directories and they are 100% same, both timestamp >>>> >>>> >>> and >>> >>> >>>> size wise. >>>> >>>> Not all the slaves face this issue. I would say roughly 50% has this >>>> trouble. >>>> >>>> Logs do not have any errors too :-( >>>> >>>> Any other things I should do/look at? >>>> >>>> Cheers >>>> Madu >>>> >>>> >>>> -----Original Message----- >>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] >>>> Sent: Tuesday, 10 November 2009 9:26 AM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Segment file not found error - after replicating >>>> >>>> It's hard to troubleshoot blindly like this, but have you tried manually >>>> comparing the contents of the index dir on the master and on the slave(s)? >>>> If they are out of sync, have you tried forcing of replication to see if >>>> one >>>> >>>> >>> of >>> >>> >>>> the subsequent replication attempts gets the dirs in sync? >>>> Do you have more than 1 slave and do they all start having this problem at >>>> the >>>> same time? >>>> Any errors in the logs for any of the scripts involved in replication in >>>> 1.3? >>>> >>>> Otis >>>> -- >>>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >>>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >>>> >>>> >>>> >>>> ----- Original Message ---- >>>> >>>> >>>>> From: Maduranga Kannangara >>>>> To: "solr-user@lucene.apache.org" >>>>> Sent: Sun, November 8, 2009 10:30:44 PM >>>>> Subject: Segment file not found error - after replicating >>>>> >>>>> Hi guys, >>>>> >>>>> We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux >>>>> environment and use the replication scripts to make replicas those live in >>>>> >>>>> >>>> load >>>> >>>> >>>>> balancing slaves. >>>>> >>>>> The issue we face quite often (only in Linux servers) is that they tend to >>>>> >>>>> >>> not >>> >>> >>>>> been able to find the segment file (segment_x etc) after the replicating >>>>> completed. As this has become quite common, we started hitting a serious >>>>> >>>>> >>>> issue. >>>> >>>> >>>>> Below is a stack trace, if that helps and any help on this matter is >>>>> greatly >>>>> appreciated. >>>>> >>>>> -------------------------------- >>>>> >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader >>>>> >>>>> >>> load >>> >>> >>>>> INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader >>>>> >>>>> >>> load >>> >>> >>>>> INFO: created /admin/ping: org.apache.solr.handler.PingRequestHandler >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader >>>>> >>>>> >>> load >>> >>> >>>>> INFO: created /debug/dump: org.apache.solr.handler.DumpRequestHandler >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader >>>>> >>>>> >>> load >>> >>> >>>>> INFO: created gap: org.apache.solr.highlight.GapFragmenter >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader >>>>> >>>>> >>> load >>> >>> >>>>> INFO: created regex: org.apache.solr.highlight.RegexFragmenter >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader >>>>> >>>>> >>> load >>> >>> >>>>> INFO: created html: org.apache.solr.highlight.HtmlFormatter >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrDispatchFilter init >>>>> SEVERE: Could not start SOLR. Check solr/home property >>>>> java.lang.RuntimeException: java.io.FileNotFoundException: >>>>> /solrinstances/solrhome01/data/index/segments_v (No such file or >>>>> directory) >>>>> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) >>>>> at org.apache.solr.core.SolrCore.(SolrCore.java:470) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119) >>> >>> >>>>> at >>>>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) >>> >>> >>>>> at >>>>> org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) >>>>> at >>>>> org.apache.catalina.core.StandardContext.reload(StandardContext.java:3099) >>>>> at >>>>> org.apache.catalina.manager.ManagerServlet.reload(ManagerServlet.java:916) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.manager.HTMLManagerServlet.reload(HTMLManagerServlet.java:536) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:114) >>> >>> >>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) >>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> >>> >>>>> at com.jamonapi.JAMonFilter.doFilter(JAMonFilter.java:57) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >>> >>> >>>>> at >>>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) >>>>> at >>>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >>> >>> >>>>> at >>>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) >>>>> at java.lang.Thread.run(Thread.java:619) >>>>> Caused by: java.io.FileNotFoundException: >>>>> /solrinstances/solrhome01/data/index/segments_v (No such file or >>>>> directory) >>>>> at java.io.RandomAccessFile.open(Native Method) >>>>> at java.io.RandomAccessFile.(RandomAccessFile.java:212) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:552) >>> >>> >>>>> at >>>>> org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:582) >>>>> at >>>>> >>>>> >>> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488) >>> >>> >>>>> at >>>>> >>>>> >>> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:482) >>> >>> >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:214) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:94) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:111) >>> >>> >>>>> at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) >>>>> at org.apache.lucene.index.IndexReader.open(IndexReader.java:237) >>>>> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:951) >>>>> ... 30 more >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.common.SolrException log >>>>> SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: >>>>> /solrinstances/solrhome01/data/index/segments_v (No such file or >>>>> directory) >>>>> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960) >>>>> at org.apache.solr.core.SolrCore.(SolrCore.java:470) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:119) >>> >>> >>>>> at >>>>> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) >>> >>> >>>>> at >>>>> org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) >>>>> at >>>>> org.apache.catalina.core.StandardContext.reload(StandardContext.java:3099) >>>>> at >>>>> org.apache.catalina.manager.ManagerServlet.reload(ManagerServlet.java:916) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.manager.HTMLManagerServlet.reload(HTMLManagerServlet.java:536) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:114) >>> >>> >>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) >>>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> >>> >>>>> at com.jamonapi.JAMonFilter.doFilter(JAMonFilter.java:57) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >>> >>> >>>>> at >>>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) >>>>> at >>>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >>> >>> >>>>> at >>>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) >>>>> at java.lang.Thread.run(Thread.java:619) >>>>> Caused by: java.io.FileNotFoundException: >>>>> /solrinstances/solrhome01/data/index/segments_v (No such file or >>>>> directory) >>>>> at java.io.RandomAccessFile.open(Native Method) >>>>> at java.io.RandomAccessFile.(RandomAccessFile.java:212) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:552) >>> >>> >>>>> at >>>>> org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:582) >>>>> at >>>>> >>>>> >>> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488) >>> >>> >>>>> at >>>>> >>>>> >>> org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:482) >>> >>> >>>>> at >>>>> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:214) >>>>> at >>>>> >>>>> >>>>> >>> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:94) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:653) >>> >>> >>>>> at >>>>> >>>>> >>>>> >>> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:111) >>> >>> >>>>> at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) >>>>> at org.apache.lucene.index.IndexReader.open(IndexReader.java:237) >>>>> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:951) >>>>> ... 30 more >>>>> >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrDispatchFilter init >>>>> INFO: SolrDispatchFilter.init() done >>>>> Nov 5, 2009 11:34:46 PM org.apache.solr.servlet.SolrServlet init >>>>> INFO: SolrServlet.init() >>>>> >>>>> -------------------------------- >>>>> >>>>> Steps to re-produce the error (However, for me did not work in my local >>>>> box. >>>>> Also remote server is too far away to remote-debug!). >>>>> >>>>> - Post some new data to the master server (Usually about 1Gb worth text >>>>> >>>>> >>>> files) >>>> >>>> >>>>> - Run the replicate script in slave Solr instance >>>>> - Try to login to admin in slave Solr instance >>>>> >>>>> And you should see above stack trace even in the Tomcat output. >>>>> >>>>> >>>>> Thanks in advance. >>>>> Madu >>>>> >>>>> >> > > > -- > - Mark > > http://www.lucidimagination.com > > > > -- - Mark http://www.lucidimagination.com