Re: Help with SolrCloud exceptions while recovering

Bruno Osiek Sun, 09 Nov 2014 13:58:09 -0800

Erick,

Once again thank you very much for your attention.


Now my pseudo-distributed SolrCloud is configured with no inconsistency. An
additional problem was starting Jboss with "solr.data.dir" set to a path
not expected by Solr (actually it was not even underneath solr.home
directory).

This thread (
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3ccao8xr5zv8o-s6zn7ypaxpzpourqjknbsm59mbe6h3dpfykg...@mail.gmail.com%3E)
explains the inconsistency.

I found no need to change Solr data directory. After commenting this
property at Jboss' standalone.xml and setting
"<lockType>${solr.lock.type:native}</lockType>" everything started to work
properly.

Regards,
Bruno



2014-11-09 14:35 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:

> OK, we're _definitely_ in the speculative realm here, so don't think
> I know more than I do ;)...
>
> The next thing I'd try is to go back to "native" as the lock type on the
> theory that the lock type wasn't your problem, it was the too-frequent
> commits.
>
> bq: This file "_1.nvm" once existed. Was deleted during one auto commit ,
> but
> remains somewhere in a queue for deletion
>
> Assuming Unix, this is entirely expected. Searchers have all the files
> open. Commits
> do background merges, which may delete segments. So the current searcher
> may
> have the file open even though it's been "merged away". When the searcher
> closes, the file will actually truly disappear.
>
> It's more complicated on Windows but eventually that's what happens
>
> Anyway, keep us posted. If this continues to occur, please open a new
> thread,
> that might catch the eye of people who are deep into Lucene file locking...
>
> Best,
> Erick
>
> On Sun, Nov 9, 2014 at 6:45 AM, Bruno Osiek <baos...@gmail.com> wrote:
> > Hi Erick,
> >
> > Thank you very much for your reply.
> > I disabled client commit while setting commits at solconfig.xml as
> follows:
> >
> >      <autoCommit>
> >        <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
> >        <openSearcher>false</openSearcher>
> >      </autoCommit>
> >
> >      <autoSoftCommit>
> >        <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
> >      </autoSoftCommit>
> >
> > The picture changed for the better. No more index corruption, endless
> > replication trials and, up till now, 16 hours since start-up and more
> than
> > 142k tweet downloaded, shards and replicas are "active".
> >
> > One problem remains though. While auto committing Solr logs the following
> > stack-trace
> >
> > 00:00:40,383 ERROR [org.apache.solr.update.CommitTracker]
> > (commitScheduler-25-thread-1) auto commit
> > error...:org.apache.solr.common.SolrException: *Error opening new
> searcher*
> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550)
> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662)
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
> > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > *Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
> > _1.nvm*
> > at
> >
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252)
> > at
> >
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238)
> > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
> > at java.util.TimSort.sort(TimSort.java:203)
> > at java.util.TimSort.sort(TimSort.java:173)
> > at java.util.Arrays.sort(Arrays.java:659)
> > at java.util.Collections.sort(Collections.java:217)
> > at
> >
> org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286)
> > at
> >
> org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017)
> > at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986)
> > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407)
> > at
> >
> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:287)
> > at
> >
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
> > at
> >
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1461)
> > ... 10 more
> > *Caused by: java.io.FileNotFoundException: _1.nvm*
> > at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260)
> > at
> >
> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
> > at
> >
> org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141)
> > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:513)
> > at
> >
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:242)
> > ... 24 more
> >
> > This file "_1.nvm" once existed. Was deleted during one auto commit , but
> > remains somewhere in a queue for deletion. I believe the consequence is
> > that at SolrCloud Admin UI -> Core Admin -> Stats, the "Current" status
> is
> > off for all shards' replica number 3. If I understand correctly this
> means
> > that changes to the index are not becoming visible.
> >
> > Once again I tried to find possible reasons for that situation, but none
> of
> > the threads found seems to reflect my case.
> >
> > My lock type is set to: <lockType>${solr.lock.type:single}</lockType>.
> This
> > is due to lock.wait timeout error with both "native" and "simple" when
> > trying to create collection using the commands API. There is a thread
> > discussing this issue:
> >
> >
> http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-td4098731.html
> >
> > The only thing is that "single" should only be used if "there is no
> > possibility of another process trying to modify the index" and I
> > cannot guarantee that. Could that be the cause of the file not found
> > exception?
> >
> > Thanks once again for your help.
> >
> > Regards,
> > Bruno.
> >
> >
> >
> > 2014-11-08 18:36 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:
> >
> >> First. for tweets committing every 500 docs is much too frequent.
> >> Especially from the client and super-especially if you have multiple
> >> clients running. I'd recommend you just configure solrconfig this way
> >> as a place to start and do NOT commit from any clients.
> >> 1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
> >> 2> a soft commit every minute
> >>
> >> This latter governs how long it'll be between when a doc is indexed and
> >> when
> >> can be searched.
> >>
> >> Here's a long post about how all this works:
> >>
> >>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >>
> >> As far as the rest, it's a puzzle definitely. If it continues, a
> complete
> >> stack
> >> trace would be a good thing to start with.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek <baos...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I am a newbie SolrCloud enthusiast. My goal is to implement an
> >> > infrastructure to enable text analysis (clustering, classification,
> >> > information extraction, sentiment analysis, etc).
> >> >
> >> > My development environment consists of one machine, quad-core
> processor,
> >> > 16GB RAM and 1TB HD.
> >> >
> >> > Have started implementing Apache Flume, Twitter as source and
> SolrCloud
> >> > (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
> >> > configuration and managing cluster.
> >> >
> >> > The pseudo-distributed cluster consists of one collection, three
> shards
> >> > each with three replicas.
> >> >
> >> > Everything runs smoothly for a while. After 50.000 tweets committed
> >> > (actually CloudSolrServer commits every batch consisting of 500
> >> documents)
> >> > randomly SolrCloud starts logging exceptions: Lucene file not found,
> >> > IndexWriter cannot be opened, replication unsuccessful and the likes.
> >> > Recovery starts with no success until replica goes down.
> >> >
> >> > Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1)
> with
> >> > same results.
> >> >
> >> > I have looked everywhere for help before writing this email. My guess
> >> right
> >> > now is that the problem lies with SolrCloud and Zookeeper connection,
> >> > although haven't seen any such exception.
> >> >
> >> > Any reference or help will be welcomed.
> >> >
> >> > Cheers,
> >> > B.
> >>
>

Re: Help with SolrCloud exceptions while recovering

Reply via email to