Yeah this is Java 1.6. The indexes are being written to a local disk, but they files being indexed live on a NFS.
-----Original Message----- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Tuesday, October 06, 2009 2:59 PM To: solr-user@lucene.apache.org Subject: Re: Solr Timeouts Is this Java 1.5? There are known threading bugs in 1.5 that were fixed in Java 1.6. Also, there was one short series of 1.6 releases that wrote bogus Lucene index files. So, make sure you use the latest Java 1.6 release. Also, I hope this is a local disk. Some shops try running over NFS or Windows file sharing and this often does not work well. Lance On 10/6/09, Giovanni Fernandez-Kincade <gfernandez-kinc...@capitaliq.com> wrote: > Is it possible that deletions are triggering these commits? Some of the > documents that I'm making indexing requests for already exist in the index, > so they would result in deletions. I tried messing with some of these > parameters but I'm still running into the same problem: > > <deletionPolicy class="solr.SolrDeletionPolicy"> > <!-- Keep only optimized commit points --> > <str name="keepOptimizedOnly">false</str> > <!-- The maximum number of commit points to be kept --> > <str name="maxCommitsToKeep">100</str> > <!-- > Delete all commit points once they have reached the given age. > Supports DateMathParser syntax e.g. > > <str name="maxCommitAge">30MINUTES</str> > <str name="maxCommitAge">1DAY</str> > --> > </deletionPolicy> > > This is happening like every 30-40minutes and it's really hampering the > indexing progress... > > > -----Original Message----- > From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] > Sent: Monday, October 05, 2009 2:11 PM > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Subject: RE: Solr Timeouts > > I just grabbed another stack trace for a thread that has been similarly > blocking for over an hour. Notice that there is no Commit in this one: > > http-8080-Processor67 [RUNNABLE] CPU time: 1:02:05 > org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) > org.apache.lucene.index.SegmentTermEnum.next() > org.apache.lucene.index.SegmentTermEnum.scanTo(Term) > org.apache.lucene.index.TermInfosReader.get(Term, boolean) > org.apache.lucene.index.TermInfosReader.get(Term) > org.apache.lucene.index.SegmentTermDocs.seek(Term) > org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int) > org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos) > org.apache.lucene.index.IndexWriter.applyDeletes() > org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean) > org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean) > org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean) > org.apache.lucene.index.IndexWriter.updateDocument(Term, Document, Analyzer) > org.apache.lucene.index.IndexWriter.updateDocument(Term, Document) > org.apache.solr.update.DirectUpdateHandler2.addDoc(AddUpdateCommand) > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(AddUpdateCommand) > org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(SolrContentHandler, > AddUpdateCommand) > org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(SolrContentHandler) > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(SolrQueryRequest, > SolrQueryResponse, ContentStream) > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, > SolrQueryResponse) > org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, > SolrQueryResponse) > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest, > SolrQueryResponse) > org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, > SolrQueryResponse) > org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, > SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) > org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, > ServletResponse, FilterChain) > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest, > ServletResponse) > org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, > ServletResponse) > org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response) > org.apache.catalina.core.StandardContextValve.invoke(Request, Response) > org.apache.catalina.core.StandardHostValve.invoke(Request, Response) > org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response) > org.apache.catalina.core.StandardEngineValve.invoke(Request, Response) > org.apache.catalina.connector.CoyoteAdapter.service(Request, Response) > org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream) > org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection, > Object[]) > org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, > TcpConnection, Object[]) > org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[]) > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() > java.lang.Thread.run() > > > -----Original Message----- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley > Sent: Monday, October 05, 2009 1:18 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr Timeouts > > OK... next step is to verify that SolrCell doesn't have a bug that > causes it to commit. > I'll try and verify today unless someone else beats me to it. > > -Yonik > http://www.lucidimagination.com > > On Mon, Oct 5, 2009 at 1:04 PM, Giovanni Fernandez-Kincade > <gfernandez-kinc...@capitaliq.com> wrote: >> I'm fairly certain that all of the indexing jobs are calling SOLR with >> commit=false. They all construct the indexing URLs using a CLR function I >> wrote, which takes in a Commit parameter, which is always set to false. >> >> Also, I don't see any calls to commit in the Tomcat logs (whereas normally >> when I make a commit call I do). >> >> This suggests that Solr is doing it automatically, but the extract handler >> doesn't seem to be the problem: >> <requestHandler name="/update/extract" >> class="org.apache.solr.handler.extraction.ExtractingRequestHandler" >> startup="lazy"> >> <lst name="defaults"> >> <str name="uprefix">ignored_</str> >> <str name="map.content">fileData</str> >> </lst> >> </requestHandler> >> >> >> There is no external config file specified, and I don't see anything about >> commits here. >> >> I've tried setting up more detailed indexer logging but haven't been able >> to get it to work: >> <infoStream file="c:\solr\indexer.log">true</infoStream> >> >> I tried relative and absolute paths, but no dice so far. >> >> Any other ideas? >> >> -Gio. >> >> -----Original Message----- >> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik >> Seeley >> Sent: Monday, October 05, 2009 12:52 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr Timeouts >> >>> This is what one of my SOLR requests look like: >>> >>> http://titans:8080/solr/update/extract/?literal.versionId=684936&literal.filingDate=1997-12-04T00:00:00Z&literal.formTypeId=95&literal.companyId=3567904&literal.sourceId=0&resource.name=684936.txt&commit=false >> >> Have you verified that all of your indexing jobs (you said you had 4 >> or 5) have commit=false? >> >> Also make sure that your extract handler doesn't have a default of >> something that could cause a commit - like commitWithin or something. >> >> -Yonik >> http://www.lucidimagination.com >> >> >> >> On Mon, Oct 5, 2009 at 12:44 PM, Giovanni Fernandez-Kincade >> <gfernandez-kinc...@capitaliq.com> wrote: >>> Is there somewhere other than solrConfig.xml that the autoCommit feature >>> is enabled? I've looked through that file and found autocommit to be >>> commented out: >>> >>> >>> >>> <!-- >>> >>> Perform a <commit/> automatically under certain conditions: >>> >>> maxDocs - number of updates since last commit is greater than >>> this >>> >>> maxTime - oldest uncommited update (in ms) is this long ago >>> >>> <autoCommit> >>> >>> <maxDocs>10000</maxDocs> >>> >>> <maxTime>1000</maxTime> >>> >>> </autoCommit> >>> >>> >>> >>> >>> >>> --> >>> >>> >>> >> >>> >>> >>> >>> -----Original Message----- >>> From: Feak, Todd [mailto:todd.f...@smss.sony.com] >>> Sent: Monday, October 05, 2009 12:40 PM >>> To: solr-user@lucene.apache.org >>> Subject: RE: Solr Timeouts >>> >>> >>> >>> Actually, ignore my other response. >>> >>> >>> >>> I believe you are committing, whether you know it or not. >>> >>> >>> >>> This is in your provided stack trace >>> >>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor, >>> SolrParams, boolean) >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, >>> SolrQueryResponse) >>> >>> >>> >>> I think Yonik gave you additional information for how to make it faster. >>> >>> >>> >>> -Todd >>> >>> >>> >>> -----Original Message----- >>> >>> From: Giovanni Fernandez-Kincade >>> [mailto:gfernandez-kinc...@capitaliq.com] >>> >>> Sent: Monday, October 05, 2009 9:30 AM >>> >>> To: solr-user@lucene.apache.org >>> >>> Subject: RE: Solr Timeouts >>> >>> >>> >>> I'm not committing at all actually - I'm waiting for all 6 million to be >>> done. >>> >>> >>> >>> -----Original Message----- >>> >>> From: Feak, Todd [mailto:todd.f...@smss.sony.com] >>> >>> Sent: Monday, October 05, 2009 12:10 PM >>> >>> To: solr-user@lucene.apache.org >>> >>> Subject: RE: Solr Timeouts >>> >>> >>> >>> How often are you committing? >>> >>> >>> >>> Every time you commit, Solr will close the old index and open the new >>> one. If you are doing this in parallel from multiple jobs (4-5 you >>> mention) then eventually the server gets behind and you start to pile up >>> commit requests. Once this starts to happen, it will cascade out of >>> control if the rate of commits isn't slowed. >>> >>> >>> >>> -Todd >>> >>> >>> >>> ________________________________ >>> >>> From: Giovanni Fernandez-Kincade >>> [mailto:gfernandez-kinc...@capitaliq.com] >>> >>> Sent: Monday, October 05, 2009 9:04 AM >>> >>> To: solr-user@lucene.apache.org >>> >>> Subject: Solr Timeouts >>> >>> >>> >>> Hi, >>> >>> I'm attempting to index approximately 6 million HTML/Text files using >>> SOLR 1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat >>> and JVM. I've fired up 4-5 different jobs that are making indexing >>> requests using the ExtractionRequestHandler, and everything works well >>> for about 30-40 minutes, after which all indexing requests start timing >>> out. I profiled the server and found that all of the threads are getting >>> blocked by this call to flush the Lucene index to disk (see below). >>> >>> >>> >>> This leads me to a few questions: >>> >>> >>> >>> 1. Is this normal? >>> >>> >>> >>> 2. Can I reduce the frequency with which this happens somehow? I've >>> greatly increased the indexing options in SolrConfig.xml (attached here) >>> to no avail. >>> >>> >>> >>> 3. During these flushes, resource utilization (CPU, I/O, Memory >>> Consumption) is significantly down compared to when requests are being >>> handled. Is there any way to make this index go faster? I have plenty of >>> bandwidth on the machine. >>> >>> >>> >>> I appreciate any insight you can provide. We're currently using MS SQL >>> 2005 as our full-text solution and are pretty much miserable. So far SOLR >>> has been a great experience. >>> >>> >>> >>> Thanks, >>> >>> Gio. >>> >>> >>> >>> http-8080-Processor21 [RUNNABLE] CPU time: 9:51 >>> >>> java.io.RandomAccessFile.seek(long) >>> >>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[], >>> int, int) >>> >>> org.apache.lucene.store.BufferedIndexInput.refill() >>> >>> org.apache.lucene.store.BufferedIndexInput.readByte() >>> >>> org.apache.lucene.store.IndexInput.readVInt() >>> >>> org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) >>> >>> org.apache.lucene.index.SegmentTermEnum.next() >>> >>> org.apache.lucene.index.SegmentTermEnum.scanTo(Term) >>> >>> org.apache.lucene.index.TermInfosReader.get(Term, boolean) >>> >>> org.apache.lucene.index.TermInfosReader.get(Term) >>> >>> org.apache.lucene.index.SegmentTermDocs.seek(Term) >>> >>> org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int) >>> >>> org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos) >>> >>> org.apache.lucene.index.IndexWriter.applyDeletes() >>> >>> org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean) >>> >>> org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean) >>> >>> org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean) >>> >>> org.apache.lucene.index.IndexWriter.closeInternal(boolean) >>> >>> org.apache.lucene.index.IndexWriter.close(boolean) >>> >>> org.apache.lucene.index.IndexWriter.close() >>> >>> org.apache.solr.update.SolrIndexWriter.close() >>> >>> org.apache.solr.update.DirectUpdateHandler2.closeWriter() >>> >>> org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand) >>> >>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand) >>> >>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor, >>> SolrParams, boolean) >>> >>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, >>> SolrQueryResponse) >>> >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, >>> SolrQueryResponse) >>> >>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest, >>> SolrQueryResponse) >>> >>> org.apache.solr.core.SolrCore.execute(SolrRequestHandler, >>> SolrQueryRequest, SolrQueryResponse) >>> >>> org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, >>> SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) >>> >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, >>> ServletResponse, FilterChain) >>> >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest, >>> ServletResponse) >>> >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, >>> ServletResponse) >>> >>> org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response) >>> >>> org.apache.catalina.core.StandardContextValve.invoke(Request, Response) >>> >>> org.apache.catalina.core.StandardHostValve.invoke(Request, Response) >>> >>> org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response) >>> >>> org.apache.catalina.core.StandardEngineValve.invoke(Request, Response) >>> >>> org.apache.catalina.connector.CoyoteAdapter.service(Request, Response) >>> >>> org.apache.coyote.http11.Http11Processor.process(InputStream, >>> OutputStream) >>> >>> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection, >>> Object[]) >>> >>> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, >>> TcpConnection, Object[]) >>> >>> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[]) >>> >>> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() >>> >>> java.lang.Thread.run() >>> >>> >>> >>> >>> >>> >>> >> > -- Lance Norskog goks...@gmail.com