Ok. Guess that isn't a problem. :) A second consideration... I could see lock contention being an issue with multiple clients indexing at once. Is there any disadvantage to serializing the clients to remove lock contention?
-Todd -----Original Message----- From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 9:30 AM To: solr-user@lucene.apache.org Subject: RE: Solr Timeouts I'm not committing at all actually - I'm waiting for all 6 million to be done. -----Original Message----- From: Feak, Todd [mailto:todd.f...@smss.sony.com] Sent: Monday, October 05, 2009 12:10 PM To: solr-user@lucene.apache.org Subject: RE: Solr Timeouts How often are you committing? Every time you commit, Solr will close the old index and open the new one. If you are doing this in parallel from multiple jobs (4-5 you mention) then eventually the server gets behind and you start to pile up commit requests. Once this starts to happen, it will cascade out of control if the rate of commits isn't slowed. -Todd ________________________________ From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com] Sent: Monday, October 05, 2009 9:04 AM To: solr-user@lucene.apache.org Subject: Solr Timeouts Hi, I'm attempting to index approximately 6 million HTML/Text files using SOLR 1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat and JVM. I've fired up 4-5 different jobs that are making indexing requests using the ExtractionRequestHandler, and everything works well for about 30-40 minutes, after which all indexing requests start timing out. I profiled the server and found that all of the threads are getting blocked by this call to flush the Lucene index to disk (see below). This leads me to a few questions: 1. Is this normal? 2. Can I reduce the frequency with which this happens somehow? I've greatly increased the indexing options in SolrConfig.xml (attached here) to no avail. 3. During these flushes, resource utilization (CPU, I/O, Memory Consumption) is significantly down compared to when requests are being handled. Is there any way to make this index go faster? I have plenty of bandwidth on the machine. I appreciate any insight you can provide. We're currently using MS SQL 2005 as our full-text solution and are pretty much miserable. So far SOLR has been a great experience. Thanks, Gio. http-8080-Processor21 [RUNNABLE] CPU time: 9:51 java.io.RandomAccessFile.seek(long) org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[], int, int) org.apache.lucene.store.BufferedIndexInput.refill() org.apache.lucene.store.BufferedIndexInput.readByte() org.apache.lucene.store.IndexInput.readVInt() org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos) org.apache.lucene.index.SegmentTermEnum.next() org.apache.lucene.index.SegmentTermEnum.scanTo(Term) org.apache.lucene.index.TermInfosReader.get(Term, boolean) org.apache.lucene.index.TermInfosReader.get(Term) org.apache.lucene.index.SegmentTermDocs.seek(Term) org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int) org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos) org.apache.lucene.index.IndexWriter.applyDeletes() org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean) org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean) org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean) org.apache.lucene.index.IndexWriter.closeInternal(boolean) org.apache.lucene.index.IndexWriter.close(boolean) org.apache.lucene.index.IndexWriter.close() org.apache.solr.update.SolrIndexWriter.close() org.apache.solr.update.DirectUpdateHandler2.closeWriter() org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand) org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand) org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor, SolrParams, boolean) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse) org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, SolrQueryResponse) org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest, SolrQueryResponse) org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, SolrRequestHandler, SolrQueryRequest, SolrQueryResponse) org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, ServletResponse, FilterChain) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest, ServletResponse) org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, ServletResponse) org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response) org.apache.catalina.core.StandardContextValve.invoke(Request, Response) org.apache.catalina.core.StandardHostValve.invoke(Request, Response) org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response) org.apache.catalina.core.StandardEngineValve.invoke(Request, Response) org.apache.catalina.connector.CoyoteAdapter.service(Request, Response) org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream) org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection, Object[]) org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, TcpConnection, Object[]) org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[]) org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run() java.lang.Thread.run()