Erick, Once again thank you very much for your attention.
Now my pseudo-distributed SolrCloud is configured with no inconsistency. An additional problem was starting Jboss with "solr.data.dir" set to a path not expected by Solr (actually it was not even underneath solr.home directory). This thread ( http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3ccao8xr5zv8o-s6zn7ypaxpzpourqjknbsm59mbe6h3dpfykg...@mail.gmail.com%3E) explains the inconsistency. I found no need to change Solr data directory. After commenting this property at Jboss' standalone.xml and setting "<lockType>${solr.lock.type:native}</lockType>" everything started to work properly. Regards, Bruno 2014-11-09 14:35 GMT-02:00 Erick Erickson <erickerick...@gmail.com>: > OK, we're _definitely_ in the speculative realm here, so don't think > I know more than I do ;)... > > The next thing I'd try is to go back to "native" as the lock type on the > theory that the lock type wasn't your problem, it was the too-frequent > commits. > > bq: This file "_1.nvm" once existed. Was deleted during one auto commit , > but > remains somewhere in a queue for deletion > > Assuming Unix, this is entirely expected. Searchers have all the files > open. Commits > do background merges, which may delete segments. So the current searcher > may > have the file open even though it's been "merged away". When the searcher > closes, the file will actually truly disappear. > > It's more complicated on Windows but eventually that's what happens > > Anyway, keep us posted. If this continues to occur, please open a new > thread, > that might catch the eye of people who are deep into Lucene file locking... > > Best, > Erick > > On Sun, Nov 9, 2014 at 6:45 AM, Bruno Osiek <baos...@gmail.com> wrote: > > Hi Erick, > > > > Thank you very much for your reply. > > I disabled client commit while setting commits at solconfig.xml as > follows: > > > > <autoCommit> > > <maxTime>${solr.autoCommit.maxTime:300000}</maxTime> > > <openSearcher>false</openSearcher> > > </autoCommit> > > > > <autoSoftCommit> > > <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime> > > </autoSoftCommit> > > > > The picture changed for the better. No more index corruption, endless > > replication trials and, up till now, 16 hours since start-up and more > than > > 142k tweet downloaded, shards and replicas are "active". > > > > One problem remains though. While auto committing Solr logs the following > > stack-trace > > > > 00:00:40,383 ERROR [org.apache.solr.update.CommitTracker] > > (commitScheduler-25-thread-1) auto commit > > error...:org.apache.solr.common.SolrException: *Error opening new > searcher* > > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550) > > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662) > > at > > > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603) > > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > > at > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > *Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: > > _1.nvm* > > at > > > org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252) > > at > > > org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238) > > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) > > at java.util.TimSort.sort(TimSort.java:203) > > at java.util.TimSort.sort(TimSort.java:173) > > at java.util.Arrays.sort(Arrays.java:659) > > at java.util.Collections.sort(Collections.java:217) > > at > > > org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286) > > at > > > org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017) > > at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986) > > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407) > > at > > > org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:287) > > at > > > org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272) > > at > > > org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251) > > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1461) > > ... 10 more > > *Caused by: java.io.FileNotFoundException: _1.nvm* > > at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260) > > at > > > org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177) > > at > > > org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141) > > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:513) > > at > > > org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:242) > > ... 24 more > > > > This file "_1.nvm" once existed. Was deleted during one auto commit , but > > remains somewhere in a queue for deletion. I believe the consequence is > > that at SolrCloud Admin UI -> Core Admin -> Stats, the "Current" status > is > > off for all shards' replica number 3. If I understand correctly this > means > > that changes to the index are not becoming visible. > > > > Once again I tried to find possible reasons for that situation, but none > of > > the threads found seems to reflect my case. > > > > My lock type is set to: <lockType>${solr.lock.type:single}</lockType>. > This > > is due to lock.wait timeout error with both "native" and "simple" when > > trying to create collection using the commands API. There is a thread > > discussing this issue: > > > > > http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-td4098731.html > > > > The only thing is that "single" should only be used if "there is no > > possibility of another process trying to modify the index" and I > > cannot guarantee that. Could that be the cause of the file not found > > exception? > > > > Thanks once again for your help. > > > > Regards, > > Bruno. > > > > > > > > 2014-11-08 18:36 GMT-02:00 Erick Erickson <erickerick...@gmail.com>: > > > >> First. for tweets committing every 500 docs is much too frequent. > >> Especially from the client and super-especially if you have multiple > >> clients running. I'd recommend you just configure solrconfig this way > >> as a place to start and do NOT commit from any clients. > >> 1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes) > >> 2> a soft commit every minute > >> > >> This latter governs how long it'll be between when a doc is indexed and > >> when > >> can be searched. > >> > >> Here's a long post about how all this works: > >> > >> > https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > >> > >> > >> As far as the rest, it's a puzzle definitely. If it continues, a > complete > >> stack > >> trace would be a good thing to start with. > >> > >> Best, > >> Erick > >> > >> On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek <baos...@gmail.com> wrote: > >> > Hi, > >> > > >> > I am a newbie SolrCloud enthusiast. My goal is to implement an > >> > infrastructure to enable text analysis (clustering, classification, > >> > information extraction, sentiment analysis, etc). > >> > > >> > My development environment consists of one machine, quad-core > processor, > >> > 16GB RAM and 1TB HD. > >> > > >> > Have started implementing Apache Flume, Twitter as source and > SolrCloud > >> > (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload > >> > configuration and managing cluster. > >> > > >> > The pseudo-distributed cluster consists of one collection, three > shards > >> > each with three replicas. > >> > > >> > Everything runs smoothly for a while. After 50.000 tweets committed > >> > (actually CloudSolrServer commits every batch consisting of 500 > >> documents) > >> > randomly SolrCloud starts logging exceptions: Lucene file not found, > >> > IndexWriter cannot be opened, replication unsuccessful and the likes. > >> > Recovery starts with no success until replica goes down. > >> > > >> > Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) > with > >> > same results. > >> > > >> > I have looked everywhere for help before writing this email. My guess > >> right > >> > now is that the problem lies with SolrCloud and Zookeeper connection, > >> > although haven't seen any such exception. > >> > > >> > Any reference or help will be welcomed. > >> > > >> > Cheers, > >> > B. > >> >