Re: Help with SolrCloud exceptions while recovering

Erick Erickson Mon, 10 Nov 2014 19:51:57 -0800

Glad to hear that! Thanks for closing this out.

Best,
Erick


On Sun, Nov 9, 2014 at 4:55 PM, Bruno Osiek <baos...@gmail.com> wrote:
> Erick,
>
> Once again thank you very much for your attention.
>
> Now my pseudo-distributed SolrCloud is configured with no inconsistency. An
> additional problem was starting Jboss with "solr.data.dir" set to a path
> not expected by Solr (actually it was not even underneath solr.home
> directory).
>
> This thread (
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3ccao8xr5zv8o-s6zn7ypaxpzpourqjknbsm59mbe6h3dpfykg...@mail.gmail.com%3E)
> explains the inconsistency.
>
> I found no need to change Solr data directory. After commenting this
> property at Jboss' standalone.xml and setting
> "<lockType>${solr.lock.type:native}</lockType>" everything started to work
> properly.
>
> Regards,
> Bruno
>
>
>
> 2014-11-09 14:35 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:
>
>> OK, we're _definitely_ in the speculative realm here, so don't think
>> I know more than I do ;)...
>>
>> The next thing I'd try is to go back to "native" as the lock type on the
>> theory that the lock type wasn't your problem, it was the too-frequent
>> commits.
>>
>> bq: This file "_1.nvm" once existed. Was deleted during one auto commit ,
>> but
>> remains somewhere in a queue for deletion
>>
>> Assuming Unix, this is entirely expected. Searchers have all the files
>> open. Commits
>> do background merges, which may delete segments. So the current searcher
>> may
>> have the file open even though it's been "merged away". When the searcher
>> closes, the file will actually truly disappear.
>>
>> It's more complicated on Windows but eventually that's what happens
>>
>> Anyway, keep us posted. If this continues to occur, please open a new
>> thread,
>> that might catch the eye of people who are deep into Lucene file locking...
>>
>> Best,
>> Erick
>>
>> On Sun, Nov 9, 2014 at 6:45 AM, Bruno Osiek <baos...@gmail.com> wrote:
>> > Hi Erick,
>> >
>> > Thank you very much for your reply.
>> > I disabled client commit while setting commits at solconfig.xml as
>> follows:
>> >
>> >      <autoCommit>
>> >        <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
>> >        <openSearcher>false</openSearcher>
>> >      </autoCommit>
>> >
>> >      <autoSoftCommit>
>> >        <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
>> >      </autoSoftCommit>
>> >
>> > The picture changed for the better. No more index corruption, endless
>> > replication trials and, up till now, 16 hours since start-up and more
>> than
>> > 142k tweet downloaded, shards and replicas are "active".
>> >
>> > One problem remains though. While auto committing Solr logs the following
>> > stack-trace
>> >
>> > 00:00:40,383 ERROR [org.apache.solr.update.CommitTracker]
>> > (commitScheduler-25-thread-1) auto commit
>> > error...:org.apache.solr.common.SolrException: *Error opening new
>> searcher*
>> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550)
>> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662)
>> > at
>> >
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
>> > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> > at
>> >
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> > at
>> >
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > at java.lang.Thread.run(Thread.java:745)
>> > *Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
>> > _1.nvm*
>> > at
>> >
>> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252)
>> > at
>> >
>> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238)
>> > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
>> > at java.util.TimSort.sort(TimSort.java:203)
>> > at java.util.TimSort.sort(TimSort.java:173)
>> > at java.util.Arrays.sort(Arrays.java:659)
>> > at java.util.Collections.sort(Collections.java:217)
>> > at
>> >
>> org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286)
>> > at
>> >
>> org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017)
>> > at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986)
>> > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407)
>> > at
>> >
>> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:287)
>> > at
>> >
>> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
>> > at
>> >
>> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
>> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1461)
>> > ... 10 more
>> > *Caused by: java.io.FileNotFoundException: _1.nvm*
>> > at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260)
>> > at
>> >
>> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
>> > at
>> >
>> org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141)
>> > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:513)
>> > at
>> >
>> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:242)
>> > ... 24 more
>> >
>> > This file "_1.nvm" once existed. Was deleted during one auto commit , but
>> > remains somewhere in a queue for deletion. I believe the consequence is
>> > that at SolrCloud Admin UI -> Core Admin -> Stats, the "Current" status
>> is
>> > off for all shards' replica number 3. If I understand correctly this
>> means
>> > that changes to the index are not becoming visible.
>> >
>> > Once again I tried to find possible reasons for that situation, but none
>> of
>> > the threads found seems to reflect my case.
>> >
>> > My lock type is set to: <lockType>${solr.lock.type:single}</lockType>.
>> This
>> > is due to lock.wait timeout error with both "native" and "simple" when
>> > trying to create collection using the commands API. There is a thread
>> > discussing this issue:
>> >
>> >
>> http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-td4098731.html
>> >
>> > The only thing is that "single" should only be used if "there is no
>> > possibility of another process trying to modify the index" and I
>> > cannot guarantee that. Could that be the cause of the file not found
>> > exception?
>> >
>> > Thanks once again for your help.
>> >
>> > Regards,
>> > Bruno.
>> >
>> >
>> >
>> > 2014-11-08 18:36 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:
>> >
>> >> First. for tweets committing every 500 docs is much too frequent.
>> >> Especially from the client and super-especially if you have multiple
>> >> clients running. I'd recommend you just configure solrconfig this way
>> >> as a place to start and do NOT commit from any clients.
>> >> 1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
>> >> 2> a soft commit every minute
>> >>
>> >> This latter governs how long it'll be between when a doc is indexed and
>> >> when
>> >> can be searched.
>> >>
>> >> Here's a long post about how all this works:
>> >>
>> >>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >>
>> >>
>> >> As far as the rest, it's a puzzle definitely. If it continues, a
>> complete
>> >> stack
>> >> trace would be a good thing to start with.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek <baos...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I am a newbie SolrCloud enthusiast. My goal is to implement an
>> >> > infrastructure to enable text analysis (clustering, classification,
>> >> > information extraction, sentiment analysis, etc).
>> >> >
>> >> > My development environment consists of one machine, quad-core
>> processor,
>> >> > 16GB RAM and 1TB HD.
>> >> >
>> >> > Have started implementing Apache Flume, Twitter as source and
>> SolrCloud
>> >> > (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
>> >> > configuration and managing cluster.
>> >> >
>> >> > The pseudo-distributed cluster consists of one collection, three
>> shards
>> >> > each with three replicas.
>> >> >
>> >> > Everything runs smoothly for a while. After 50.000 tweets committed
>> >> > (actually CloudSolrServer commits every batch consisting of 500
>> >> documents)
>> >> > randomly SolrCloud starts logging exceptions: Lucene file not found,
>> >> > IndexWriter cannot be opened, replication unsuccessful and the likes.
>> >> > Recovery starts with no success until replica goes down.
>> >> >
>> >> > Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1)
>> with
>> >> > same results.
>> >> >
>> >> > I have looked everywhere for help before writing this email. My guess
>> >> right
>> >> > now is that the problem lies with SolrCloud and Zookeeper connection,
>> >> > although haven't seen any such exception.
>> >> >
>> >> > Any reference or help will be welcomed.
>> >> >
>> >> > Cheers,
>> >> > B.
>> >>
>>

Re: Help with SolrCloud exceptions while recovering

Reply via email to