In addition, I had tried and since backed away from (on Solr) indexing heavily while also searching on the same server. This would lock up segments and searchers longer than the disk space would allow. I think that part of Solr can be rewritten to better handle this N/RT use case as there is no reason indexing and searching should not be more possible (even with our somewhat large and sometimes long running queries). I've since gone back to using replication to avoid running into indexing + searching and out of disk space exceptions. In RT, replication will probably not be useful because there will not be a way to replicate analyzed documents across the wire.
> Can you post the exceptions you hit? (Are these logged?). Copied below are several. We managed to experience a panoply of exceptions: SEVERE: org.apache.lucene.index.MergePolicy$MergeException: MergePolicy selected non-contiguous segments to merge ( _1pw:C4035->_1pb _1py:C19180->_1py _1pz:C22252->_1py _1q0:C23005->_1py _1q1:C22051->_1py _1q2:C19520->_1py _1q3:C17 143->_1py _1q4:C18015->_1py _1q5:C19764->_1py _1q6:C18967->_1py vs _v7:C10151578 _1pw:C9958546 _2kn:C10372070 _3fs: C11462047 _4af:C11120971 _55i:C12402453 _60d:C11249698 _6v8:C11308887 _7py:C13299679 _8ku:C11369240 _sy:C12218112 _ 1np:C11597131 _1ns:C65126 _1o3:C65375 _1oe:C63724 _1op:C60821 _1p0:C80242 _1pa:C118076 _1pl:C170005->_1pb _1px:C213 967->_1pb _1pw:C4035->_1pb _1py:C19180->_1py _1pz:C22252->_1py _1q0:C23005->_1py _1q1:C22051->_1py _1q2:C19520->_1p y _1q3:C17143->_1py _1q4:C18015->_1py _1q5:C19764->_1py _1q6:C18967->_1py _1q7:C15903->_1py _1q8:C15061->_1py _1q9: C17304->_1py _1qa:C16683->_1py _1qb:C16076->_1py _1qc:C15160->_1py _1qd:C14191->_1py _1qe:C13448->_1py _1qf:C13002- >_1py _1qg:C13040->_1py _1qh:C13222->_1py _1qi:C12896->_1py _1qj:C12559->_1py >_1qk:C12163->_1py), which IndexWriter (currently) cannot handle on solr04: Nov 6, 2010 2:44:31 PM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/mnt/solr/./data/index/lucene-5a92641e18d5832f54989a60e612116b-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:85) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1545) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1402) at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:190) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) Exception in thread "Lucene Merge Thread #2" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) Caused by: java.io.IOException: No space left on device at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(RandomAccessFile.java:499) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) at org.apache.lucene.store.BufferedIndexOutput.seek(BufferedIndexOutput.java:124) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.seek(SimpleFSDirectory.java:217) at org.apache.lucene.index.TermInfosWriter.close(TermInfosWriter.java:220) at org.apache.lucene.index.FormatPostingsFieldsWriter.finish(FormatPostingsFieldsWriter.java:70) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:589) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:154) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5029) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) null ------------------------------------------------------------- java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _v4: fieldsReader shows 117150 but segmentInfo shows 10331041 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1079) at org.apache.solr.core.SolrCore.<init>(SolrCore.java:583) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at HTTP Status 500 - null java.lang.NullPointerException at org.apache.solr.request.XMLWriter.writePrim(XMLWriter.java:761) at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java:619) at org.apache.solr.schema.TextField.write(TextField.java:45) at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:311) at org.apache.solr.request.XMLWriter$3.writeDocs(XMLWriter.java:483) at org.apache.solr.request.XMLWriter.writeDocuments(XMLWriter.java:420) at org.apache.solr.request.XMLWriter.writeDocList(XMLWriter.java:457) at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:520) at org.apache.solr.request.XMLWriter.writeResponse(XMLWriter.java:130) at org.apache.solr.request.XMLResponseWriter.write(XMLResponseWriter.java:34) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:325) at Nov 6, 2010 8:31:49 AM org.apache.solr.common.SolrException log SEVERE: java.io.IOException: read past EOF at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:135) at org.apache.lucene.index.SegmentReader$Norm.bytes(SegmentReader.java:455) at org.apache.lucene.index.SegmentReader.getNorms(SegmentReader.java:1068) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:1074) at org.apache.solr.search.SolrIndexReader.norms(SolrIndexReader.java:282) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:72) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:246) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) On Fri, Nov 5, 2010 at 2:59 PM, Michael McCandless <luc...@mikemccandless.com> wrote: > See TestIndexWriterOnDiskFull (on trunk). Look for the test w/ > LUCENE-2743 in the comment... but the other tests there also test > other cases that may hit disk full. > > Can you post the exceptions you hit? (Are these logged?). > > Yes this could be a hardware issue... > > Millions of docs indexed per hour sounds like fun! > > Mike > > On Fri, Nov 5, 2010 at 5:33 PM, Jason Rutherglen > <jason.rutherg...@gmail.com> wrote: >>> can you enable IndexWriter's infoStream >> >> I'd like to however the problem is only happening in production, and >> the indexing volume is in the millions per hour. The log would be >> clogged up, as it is I have logging in Tomcat turned off because it is >> filling up the SSD drive (yes I know, we should have an HD drive as >> well, I didn't configure the server, and we're getting new ones, >> thanks for wondering). >> >> Can you point me at the unit test that simulates this issue? Today I >> saw a different problem in that the doc store got corrupted, given >> we're streaming it to disk, how are we capturing disk full for that >> case? Meaning how can we be sure where the doc store stopped writing >> at? I haven't had time to explore what's up with this however I will >> shortly, ie, examine the unit tests and code. Perhaps though this is >> simply hardware related? >> >> On Fri, Nov 5, 2010 at 1:58 AM, Michael McCandless >> <luc...@mikemccandless.com> wrote: >>> Hmmm... Jason can you enable IndexWriter's infoStream and get the >>> corruption to happen again and post that (along with "ls -l" output)? >>> >>> Mike >>> >>> On Thu, Nov 4, 2010 at 5:11 PM, Jason Rutherglen >>> <jason.rutherg...@gmail.com> wrote: >>>> I'm still seeing this error after downloading the latest 2.9 branch >>>> version, compiling, copying to Solr 1.4 and deploying. Basically as >>>> mentioned, the .del files are of zero length... Hmm... >>>> >>>> On Wed, Oct 13, 2010 at 1:33 PM, Jason Rutherglen >>>> <jason.rutherg...@gmail.com> wrote: >>>>> Thanks Robert, that Jira issue aptly describes what I'm seeing, I think. >>>>> >>>>> On Wed, Oct 13, 2010 at 10:22 AM, Robert Muir <rcm...@gmail.com> wrote: >>>>>> if you are going to fill up your disk space all the time with solr >>>>>> 1.4.1, I suggest replacing the lucene jars with lucene jars from >>>>>> 2.9-branch >>>>>> (http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9/). >>>>>> >>>>>> then you get the fix for >>>>>> https://issues.apache.org/jira/browse/LUCENE-2593 too. >>>>>> >>>>>> On Wed, Oct 13, 2010 at 11:37 AM, Jason Rutherglen >>>>>> <jason.rutherg...@gmail.com> wrote: >>>>>>> We have unit tests for running out of disk space? However we have >>>>>>> Tomcat logs that fill up quickly and starve Solr 1.4.1 of space. The >>>>>>> main segments are probably not corrupted, however routinely now, there >>>>>>> are deletes files of length 0. >>>>>>> >>>>>>> 0 2010-10-12 18:35 _cc_8.del >>>>>>> >>>>>>> Which is fundamental index corruption, though less extreme. Are we >>>>>>> testing for this? >>>>>>> >>>>>> >>>>> >>>> >>> >> >