Re: Help with SolrCloud exceptions while recovering

Bruno Osiek Sun, 09 Nov 2014 06:47:14 -0800

Hi Erick,

Thank you very much for your reply.
I disabled client commit while setting commits at solconfig.xml as follows:

     <autoCommit>
       <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
       <openSearcher>false</openSearcher>
     </autoCommit>

     <autoSoftCommit>
       <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
     </autoSoftCommit>

The picture changed for the better. No more index corruption, endless
replication trials and, up till now, 16 hours since start-up and more than
142k tweet downloaded, shards and replicas are "active".

One problem remains though. While auto committing Solr logs the following
stack-trace

00:00:40,383 ERROR [org.apache.solr.update.CommitTracker]
(commitScheduler-25-thread-1) auto commit
error...:org.apache.solr.common.SolrException: *Error opening new searcher*
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
*Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
_1.nvm*
at
org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252)
at
org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
at java.util.TimSort.sort(TimSort.java:203)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at
org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286)
at
org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:287)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1461)
... 10 more
*Caused by: java.io.FileNotFoundException: _1.nvm*
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260)
at
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
at
org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141)
at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:513)
at
org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:242)
... 24 more

This file "_1.nvm" once existed. Was deleted during one auto commit , but
remains somewhere in a queue for deletion. I believe the consequence is
that at SolrCloud Admin UI -> Core Admin -> Stats, the "Current" status is
off for all shards' replica number 3. If I understand correctly this means
that changes to the index are not becoming visible.

Once again I tried to find possible reasons for that situation, but none of
the threads found seems to reflect my case.

My lock type is set to: <lockType>${solr.lock.type:single}</lockType>. This
is due to lock.wait timeout error with both "native" and "simple" when
trying to create collection using the commands API. There is a thread
discussing this issue:

http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-td4098731.html

The only thing is that "single" should only be used if "there is no
possibility of another process trying to modify the index" and I
cannot guarantee that. Could that be the cause of the file not found
exception?

Thanks once again for your help.

Regards,
Bruno.

2014-11-08 18:36 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:

> First. for tweets committing every 500 docs is much too frequent.
> Especially from the client and super-especially if you have multiple
> clients running. I'd recommend you just configure solrconfig this way
> as a place to start and do NOT commit from any clients.
> 1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
> 2> a soft commit every minute
>
> This latter governs how long it'll be between when a doc is indexed and
> when
> can be searched.
>
> Here's a long post about how all this works:
>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
>
> As far as the rest, it's a puzzle definitely. If it continues, a complete
> stack
> trace would be a good thing to start with.
>
> Best,
> Erick
>
> On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek <baos...@gmail.com> wrote:
> > Hi,
> >
> > I am a newbie SolrCloud enthusiast. My goal is to implement an
> > infrastructure to enable text analysis (clustering, classification,
> > information extraction, sentiment analysis, etc).
> >
> > My development environment consists of one machine, quad-core processor,
> > 16GB RAM and 1TB HD.
> >
> > Have started implementing Apache Flume, Twitter as source and SolrCloud
> > (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
> > configuration and managing cluster.
> >
> > The pseudo-distributed cluster consists of one collection, three shards
> > each with three replicas.
> >
> > Everything runs smoothly for a while. After 50.000 tweets committed
> > (actually CloudSolrServer commits every batch consisting of 500
> documents)
> > randomly SolrCloud starts logging exceptions: Lucene file not found,
> > IndexWriter cannot be opened, replication unsuccessful and the likes.
> > Recovery starts with no success until replica goes down.
> >
> > Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with
> > same results.
> >
> > I have looked everywhere for help before writing this email. My guess
> right
> > now is that the problem lies with SolrCloud and Zookeeper connection,
> > although haven't seen any such exception.
> >
> > Any reference or help will be welcomed.
> >
> > Cheers,
> > B.
>

Re: Help with SolrCloud exceptions while recovering

Reply via email to