There are 400 million documents in a shard, a document is less then 1 kb. the data file _**.fdt is 149g. Does the recovering need large memory in downloading or after downloaded?
I find some log before OOM as below: Aug 06, 2012 9:43:04 AM org.apache.solr.core.SolrCore execute INFO: [blog] webapp=/solr path=/select params={sort=createdAt+desc&distrib=false&collection=today,blog&hl.fl=content&wt=javabin&hl=false&rows=10&version=2&f.content.hl.fragsize=0&fl=id&shard.url=index35:8983/solr/blog/&NOW=1344217556702&start=0&q=((("somewordsA"+%26%26+"somewordsB"+%26%26+"somewordsC")+%26%26+platform:abc)+||+id:"/")+%26%26+(createdAt:[2012-07-30T01:43:28.462Z+TO+2012-08-06T01:43:28.462Z])&_system=business&isShard=true&fsv=true&f.title.hl.fragsize=0} hits=0 status=0 QTime=95 Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/home/ant/jetty/solr/data/index.20120801114027,segFN=segments_aui,generation=14058,filenames=[_cdnu_nrm.cfs, _cdnu_0.frq, segments_aui, _cdnu.fdt, _cdnu_nrm.cfe, _cdnu_0.tim, _cdnu.fdx, _cdnu.fnm, _cdnu_0.prx, _cdnu_0.tip, _cdnu.per] Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 14058 Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher <init> INFO: Opening Searcher@13578a09 main Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420 _cdnu(4.0):C457041702)} Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher INFO: [blog] Registered new searcher Searcher@13578a09main{StandardDirectoryReader(segments_aui:1269420 _cdnu(4.0):C457041702)} Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Aug 06, 2012 9:43:05 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [blog] webapp=/solr path=/update params={waitSearcher=true&commit_end_point=true&wt=javabin&commit=true&version=2} {commit=} 0 1439 Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit{flags=0,version=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} Aug 06, 2012 9:43:05 AM org.apache.solr.search.SolrIndexSearcher <init> INFO: Opening Searcher@1a630c4d main Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener sending requests to Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420 _cdnu(4.0):C457041702)} Aug 06, 2012 9:43:05 AM org.apache.solr.core.QuerySenderListener newSearcher INFO: QuerySenderListener done. Aug 06, 2012 9:43:05 AM org.apache.solr.core.SolrCore registerSearcher INFO: [blog] Registered new searcher Searcher@1a630c4dmain{StandardDirectoryReader(segments_aui:1269420 _cdnu(4.0):C457041702)} Aug 06, 2012 9:43:05 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Aug 06, 2012 9:43:07 AM org.apache.solr.core.SolrCore execute INFO: [blog] webapp=/solr path=/select params={sort=createdAt+desc&distrib=false&collection=today,blog&hl.fl=content&wt=javabin&hl=false&rows=10&version=2&f.content.hl.fragsize=0&fl=id&shard.url=index35:8983/solr/blog/&NOW=1344217558778&start=0&_system=business&q=(((somewordsD)+%26%26+platform:(abc))+||+id:"/")+%26%26+(createdAt:[2012-07-30T01:43:30.537Z+TO+2012-08-06T01:43:30.537Z])&isShard=true&fsv=true&f.title.hl.fragsize=0} hits=0 status=0 QTime=490 Except this log, all of other are "path=/select ******" in a few minutes, there is no add documents request in this cluster in this time.Is that related to the OOM? This is live traffic, so I can't test it frequently, Tonight I add -XX:+HeapDumpOnOutOfMemoryError option, if this problem appear once again, I will get the heap dump, but I am not sure I can analyse it and get a result. I will ask for your help please. thanks 2012/8/8 Yonik Seeley <yo...@lucidimagination.com> > Stack trace looks normal - it's just a multi-term query instantiating > a bitset. The memory is being taken up somewhere else. > How many documents are in your index? > Can you get a heap dump or use some other memory profiler to see > what's taking up the space? > > > if I stop query more then ten minutes, the solr instance will start > normally. > > Maybe queries are piling up in threads before the server is ready to > handle them and then trying to handle them all at once gives an OOM? > Is this live traffic or a test? How many concurrent requests get sent? > > -Yonik > http://lucidimagination.com > > > On Wed, Aug 8, 2012 at 2:43 AM, Jam Luo <cooljam2...@gmail.com> wrote: > > Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log > > SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java > > heap space > > at > > > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) > > at > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) > > at org.eclipse.jetty.server.Server.handle(Server.java:351) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954) > > at > org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) > > at > > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) > > at > > > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) > > at > > > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) > > at java.lang.Thread.run(Thread.java:722) > > Caused by: java.lang.OutOfMemoryError: Java heap space > > at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:54) > > at > > > org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104) > > at > > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129) > > at > > > org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318) > > at > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507) > > at > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) > > at > > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394) > > at > > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269) > > at > > > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384) > > at > > > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420) > > at > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) > > at > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) > > at org.eclipse.jetty.server.Server.handle(Server.java:351) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900) > > > > This error often appear at the startup, no data write to the index, > > but it have a lot of query request. if I stop query more then ten > > minutes, the solr instance will start normally. > > My index data in solr data directory is 200g+, RAM is 16g, jvm > > properties is > > -Xmx10g > > -Xss256k > > -Xmn512m > > -XX:+UseCompressedOops > > The OOM and the peer startup fail may be uncorrelated, but this two > > things often happen in the same solr instance and the same time. > > > > I can provide the full log file if you want. > > > > thanks > > > > > > > > > > 2012/8/7 Mark Miller <markrmil...@gmail.com> > > > >> Still no idea on the OOM - please send the stacktrace if you can. > >> > >> As for doing a replication recovery when it should not be necessary, > yonik > >> just committed a fix for that a bit ago. > >> > >> On Aug 7, 2012, at 9:41 AM, Mark Miller <markrmil...@gmail.com> wrote: > >> > >> > > >> > On Aug 7, 2012, at 5:49 AM, Jam Luo <cooljam2...@gmail.com> wrote: > >> > > >> >> Hi > >> >> I have big index data files more then 200g, there are two solr > >> >> instance in a shard. leader startup and is ok, but the peer alway > OOM > >> >> when it startup. > >> > > >> > Can you share the OOM msg and stacktrace please? > >> > > >> >> The peer alway download index files from leader because > >> >> of recoveringAfterStartup property in RecoveryStrategy, total time > >> taken > >> >> for download : 2350 secs. if data of the peer is empty, it is ok, > but > >> the > >> >> leader and the peer have a same generation number, why the peer > >> >> do recovering? > >> > > >> > We are looking into this. > >> > > >> >> > >> >> thanks > >> >> cooljam > >> > > >> > - Mark Miller > >> > lucidimagination.com > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > >> - Mark Miller > >> lucidimagination.com > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >