Hi All,

I found out there is file corruption issue by using both "EmbeddedSolrServer" & 
"Solr 1.4 Java based replication" together in slave server.


In my slave server, I have 2 webapps in a tomcat instance. 
1) "multicore" webapp with slave config
2) "my custom" webapp using EmbeddedSolrServer while queries Solr Index Data.
 
Both webapps were set up according to the instruction from Solr wiki.
However, I found out there are multi-threading issue which cause index file 
corruption.

The following is the root case:
EmbeddedSolrServer requires to have a CoreContainer object as parameter. 
However, during the creation of CoreContainer object, the process load the 
slave solr configuration which silently creates an Extra ReplcationHandler 
(SnapPuller) in background. However, there is a ReplcationHandler (SnapPuller) 
already created by multicore webapp because of the slave configuration.

As a result, there are 2 threads doing file replication as same time. It causes 
index corruption with different IOExceptions.
After I replaced the usage of EmbeddedSolrServer with CommonsHttpSolrServer 
(Stop creating CoreContainer object in slave server), Solr 1.4 Java based 
replication work perfectly without having any file corruption issue.

In other to use EmbeddedSolrServer in slave server, I think we need to have a 
way to create CoreContainer object with slave configuration without creating 
extra thread to replicate files.
Should I file a bug?

Thanks,

Osborn



-----Original Message-----
From: Osborn Chan [mailto:oc...@shutterfly.com] 
Sent: Friday, January 15, 2010 12:35 PM
To: solr-user@lucene.apache.org
Subject: RE: Index Courruption after replication by new Solr 1.4 Replication

Hi Otis,

Thanks. There is no NFS anymore, and all index files are local. We migrated to 
new Solr 1.4 new Replication in order to avoid all the NSF Stale Exception. 

Thanks,

Osborn

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Friday, January 15, 2010 12:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Index Courruption after replication by new Solr 1.4 Replication

This is not a direct answer to your question, but can you avoid NFS?  My first 
guess would be that NFS somehow causes this problem.  If you check the ML 
archives for: NFS lock , you will see what I mean.

Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Osborn Chan <oc...@shutterfly.com>
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Sent: Fri, January 15, 2010 3:23:21 PM
> Subject: Index Courruption after replication by new Solr 1.4 Replication
> 
> Hi all,
> 
> I have migrated new Solr 1.4 Replication feature with multicore support from 
> Solr 1.2 with NFS mounting recently. The following exceptions are in 
> catalina.log from time to time, and there are some EOF exceptions which I 
> believe the slave index files are corrupted after replication from index 
> server. 
> I have following configuration with Solr 1.4, please correct me if it is 
> configured incorrectly. 
> 
> (The index files are not corrupted in master servers, but it is corrupted in 
> slave servers. Usually only one of the slave servers are corrupted with EOF 
> exception, but not all.)
> 
> 1 Master Server: (Index Server)
>     - 8 indexes with multicore configuration.
>     - All indexes are configured to "replicateAfter" optimize only.
>     - The size of index data are vary. The smallest index only have 2.5 MB. 
> The 
> biggest index have ~ 100 MB. 
>     - There would be infrequent optimize calls to indexes. (a optimize call 
> every ~30 mins to 6 hours depending on indexes).
>     - There are many commit calls to all indexes. (But there is no concurrent 
> commit and optimize for all indexes.)
>     - Did not configure "commitReserveDuration" in ReplicationHandler - Using 
> default values.
> 
> 4 Slave Servers (Search Server)
>     - 8 indexes with multicore configuration.
>     - All indexes are configured to poll for every ~15 minutes.
>     - All update handler configuration are removed in solrconfig-slave.xml 
> (solrconfig.xml) in order to prevent add/commit/optimize calls. 
>     - (Search Slave Servers are only responsible for search operation.)
>         -  removed.
>         - 
> removed.
>         - 
> class="solr.BinaryUpdateRequestHandler" /> removed.
> 
> A) FileNotFoundException
> 
> INFO: Total time taken for download : 1 secs
> Jan 15, 2010 10:34:16 AM org.apache.solr.handler.ReplicationHandler doFetch
> SEVERE: SnapPull failed
> org.apache.solr.common.SolrException: Index fetch failed :
>         at 
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
>         at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
>         at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
>         at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:142)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
>         at java.lang.Thread.run(Thread.java:595)
> Caused by: java.io.FileNotFoundException: File does not exist 
> /slaveIndexData/publicGalleryTagDef/index.20100115103415/_al.fdx
>         at org.apache.solr.common.util.FileUtils.sync(FileUtils.java:55)
>         at 
> org.apache.solr.handler.SnapPuller$FileFetcher$1.run(SnapPuller.java:911)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:123)
>         ... 3 more
> Jan 15, 2010 10:34:17 AM org.apache.solr.core.SolrCore execute
> INFO: [publicGalleryPostMaster] webapp=/multicore path=/select 
> params={wt=javabin&rows=10&start=0&sort=createTime_dt+desc&q=%2B(profileId_s:/community/sfly/publicprofile/0AcM27Nw3aNWLi4)+%2Bstate_s:A&version=1}
>  
> hits=1 status=0 QTime=1
> 
> B) LockReleaseFailedException
> 
> SEVERE: SnapPull failed
> org.apache.solr.common.SolrException: Index fetch failed :
>         at 
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
>         at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
>         at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
>         at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:280)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:135)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:65)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:142)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
>         at java.lang.Thread.run(Thread.java:595)
> Caused by: org.apache.lucene.store.LockReleaseFailedException: failed to 
> delete 
> /slaveIndexData/publicGalleryTagDefAggregate/index/lucene-fb30bdbbdc6927666873dd616884ba29-write.lock
>         at 
> org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:298)
>         at 
> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2225)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2153)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2117)
>         at 
> org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:229)
>         at 
> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:181)
>         at 
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:409)
>         at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:467)
>         at 
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
>         ... 11 more
> Jan 15, 2010 12:21:18 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
> INFO: Slave in sync with master.
> 
> C) EOF Exception
> INFO: [publicGalleryPostMaster] webapp=/multicore path=/select 
> params={wt=javabin&rows=1&start=0&sort=createTime_dt+desc&q=%2B(profileId_s:/community/sfly/publicprofile/0AbOWLNszaOWTiw)+%2B(lastBookmarked_dt:[2010-01-08T08:49:38.271Z+TO+2010-01-15T08:49:38.271Z]+lastCommented_dt:[2010-01-08T08:49:38.271Z+TO+2010-01-15T08:49:38.271Z])+%2Bstate_s:A&version=1}
>  
> hits=0 status=0 QTime=2
> Jan 15, 2010 12:49:42 AM org.apache.solr.common.SolrException log
> SEVERE: java.io.IOException: read past EOF
>         at 
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:151)
>         at 
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
>         at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
>         at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64)
>         at 
> org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129)
>         at 
> org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160)
>         at 
> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
>         at 
> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179)
>         at 
> org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975)
>         at 
> org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627)
>         at 
> org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308)
>         at 
> org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:147)
>         at org.apache.lucene.search.Similarity.idfExplain(Similarity.java:833)
>         at 
> org.apache.lucene.search.PhraseQuery$PhraseWeight.(PhraseQuery.java:122)
>         at 
> org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:250)
>         at 
> org.apache.lucene.search.BooleanQuery$BooleanWeight.(BooleanQuery.java:184)
>         at 
> org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:415)
>         at org.apache.lucene.search.Query.weight(Query.java:99)
>         at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
>         at org.apache.lucene.search.Searcher.search(Searcher.java:171)
>         at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
>         at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
>         at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
>         at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
>         at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>         at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
>         at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
>         at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>         at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
>         at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
>         at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
>         at 
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541)
>         at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
>         at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
>         at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
>         at 
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
>         at 
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
>         at 
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
>         at 
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
>         at java.lang.Thread.run(Thread.java:595)
> 
> Thanks a lot!
> 
> Osborn

Reply via email to