On Mon, Oct 5, 2009 at 12:03 PM, Giovanni Fernandez-Kincade
<gfernandez-kinc...@capitaliq.com> wrote:
> Hi,
>
> I’m attempting to index approximately 6 million HTML/Text files using SOLR
> 1.4/Tomcat6 on Windows Server 2003 x64. I’m running 64 bit Tomcat and JVM.
> I’ve fired up 4-5 different jobs that are making indexing requests using the
> ExtractionRequestHandler, and everything works well for about 30-40 minutes,
> after which all indexing requests start timing out. I profiled the server
> and found that all of the threads are getting blocked by this call to flush
> the Lucene index to disk (see below).
>
>
>
> This leads me to a few questions:
>
> 1.     Is this normal?

Yes... one can't currently add documents when the first part of a
commit is going on (closing the IndexWriter).  The threads will
normally block and then resume after the writer has been successfully
closed.  This is normally fine and you can work around it by
increasing the servlet container timeout.

Due to advances in Lucene, this restriction will probably be lifted in
the next version of Solr (1.5)

> 2.     Can I reduce the frequency with which this happens somehow? I’ve
> greatly increased the indexing options in SolrConfig.xml (attached here) to
> no avail.

It looks like Solr is committing because you told it to?

> 3.     During these flushes, resource utilization (CPU, I/O, Memory
> Consumption) is significantly down compared to when requests are being
> handled. Is there any way to make this index go faster? I have plenty of
> bandwidth on the machine.

Don't commit until you're done a big indexing run?
If you're using SolrJ, use the StreamingUpdateSolrServer.... it's much faster!

-Yonik
http://www.lucidimagination.com


> I appreciate any insight you can provide. We’re currently using MS SQL 2005
> as our full-text solution and are pretty much miserable. So far SOLR has
> been a great experience.
>
>
>
> Thanks,
>
> Gio.
>
>
>
> http-8080-Processor21 [RUNNABLE] CPU time: 9:51
>
> java.io.RandomAccessFile.seek(long)
>
> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
> int, int)
>
> org.apache.lucene.store.BufferedIndexInput.refill()
>
> org.apache.lucene.store.BufferedIndexInput.readByte()
>
> org.apache.lucene.store.IndexInput.readVInt()
>
> org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
>
> org.apache.lucene.index.SegmentTermEnum.next()
>
> org.apache.lucene.index.SegmentTermEnum.scanTo(Term)
>
> org.apache.lucene.index.TermInfosReader.get(Term, boolean)
>
> org.apache.lucene.index.TermInfosReader.get(Term)
>
> org.apache.lucene.index.SegmentTermDocs.seek(Term)
>
> org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int)
>
> org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos)
>
> org.apache.lucene.index.IndexWriter.applyDeletes()
>
> org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean)
>
> org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean)
>
> org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean)
>
> org.apache.lucene.index.IndexWriter.closeInternal(boolean)
>
> org.apache.lucene.index.IndexWriter.close(boolean)
>
> org.apache.lucene.index.IndexWriter.close()
>
> org.apache.solr.update.SolrIndexWriter.close()
>
> org.apache.solr.update.DirectUpdateHandler2.closeWriter()
>
> org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand)
>
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand)
>
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
> SolrParams, boolean)
>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
> SolrQueryResponse)
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest,
> SolrQueryResponse)
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest,
> SolrQueryResponse)
>
> org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest,
> SolrQueryResponse)
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest,
> SolrRequestHandler, SolrQueryRequest, SolrQueryResponse)
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest,
> ServletResponse, FilterChain)
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest,
> ServletResponse)
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest,
> ServletResponse)
>
> org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response)
>
> org.apache.catalina.core.StandardContextValve.invoke(Request, Response)
>
> org.apache.catalina.core.StandardHostValve.invoke(Request, Response)
>
> org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response)
>
> org.apache.catalina.core.StandardEngineValve.invoke(Request, Response)
>
> org.apache.catalina.connector.CoyoteAdapter.service(Request, Response)
>
> org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream)
>
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection,
> Object[])
>
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket,
> TcpConnection, Object[])
>
> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[])
>
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
>
> java.lang.Thread.run()
>
>

Reply via email to