high CPU usage and SelectCannelConnector threads used a lot
Hi, I'm using solr and have been load testing it for around 4 days. We use the solrj client to communicate with a separate jetty based solr process on the same box. After a few days solr's CPU% is now consistently at or above 100% (multiple processors available) and the application using it is mostly not responding because it times out talking to solr. I connected visual VM to the solr JVM and found that out of the many btpool-# threads there are 4 that are pretty much stuck in the running state 100% of the time. Their names are btpool0-1-Acceptor1 SelectChannelConnector @0.0.0.0:9983 btpool0-2-Acceptor2 SelectChannelConnector @0.0.0.0:9983 btpool0-3-Acceptor3 SelectChannelConnector @0.0.0.0:9983 btpool0-9-Acceptor0 SelectChannelConnector @0.0.0.0:9983 The stacks are all the same "btpool0-2 - Acceptor2 SelectChannelConnector @ 0.0.0.0:9983" - Thread t...@27 java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <106a644> (a sun.nio.ch.Util$1) - locked <18dd381> (a java.util.Collections$UnmodifiableSet) - locked <38d07d> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:419) at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:169) at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:516) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Locked ownable synchronizers: - None All of the other idle thread pool threads are just waiting for new tasks. The active threads never seem to change, its always these 4. The selector channel appears to be in the jetty code, receiving requests from our other process through the solrj client. Does anyone know what this might mean or how to address it? Are these running all the time because they are blocked on IO so not actually consuming CPU? If so, what else might be? Is there a better way to figure out what is pinning the CPU? Some more info that might be useful. 32 bit machine ( I know, I know) 2.7GB of RAM for solr process ~2.5 is "used" According to visual VM around 25% of CPU time is spent in GC with the rest in application. Thanks for the help. John
OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
I have been load testing solr 1.4.1 and have been running into OOM errors. Not out of heap but with the GC overhead limit exceeded message meaning that it didn't actually run out of heap space but just spent too much CPU time trying to make room and gave up. I got a heap dump and sent it through the Eclipse MAT and found that a single WeakHashMap in FieldCacheImpl called readerCache is taking up 2.1GB of my 2.6GB heap. >From my understanding of WeakHashMaps the GC should be able to collect those references if it needs to but for some reason it isn't here. My questions are: 1) Any ideas why the GC is not collecting those weak references in that single hashmap? 2) Is there a nob in the solr config that can limit the size of that cache? Also, after the OOM is thrown solr doesn't respond much at all and throws the exception below, however when I go to the code I see this try { processor.processAdd(addCmd); addCmd.clear(); } catch (IOException e) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "ERROR adding document " + document); } } So its swallowing the IOException and throwing a new one without setting the cause so I can't see what the IOException is. Is this fixed in any newer version? Should I open a bug? Thanks a lot for your help John SEVERE: org.apache.solr.common.SolrException: ERROR adding document SolrInputDocument[{de.id=de.id(1.0)={C2B3B03F112C549254560A568C18}, de.type=de.type(1.0)={Social Contact}, sc.author=sc.author(1.0)={Author-3944}, sc.sourceType=sc.sourceType(1.0)={rss}, sc.link=sc.link(1.0)={ http://www.cisco.com/feed/date_12.07.10_16.18.03/idx/107 52}, sc.title=sc.title(1.0)={Title-erat metus eget vestibulum}, sc.publishedDate=sc.publishedDate(1.0)={Tue Dec 07 16:22:09 EST 2010}, sc.createdDate=sc.createdDate(1.0 )={Tue Dec 07 16:20:20 EST 2010}, sc.socialContactStatus=sc.socialContactStatus(1.0)={unread}, sc.socialContactStatusUserId=sc.socialContactStatusUserId(1.0)={}, sc.soc ialContactStatusDate=sc.socialContactStatusDate(1.0)={Tue Dec 07 16:20:20 EST 2010}, sc.tags=sc.tags(1.0)={[]}, sc.authorId=sc.authorId(1.0)={}, sc.replyToId=sc.replyTo Id(1.0)={}, sc.replyToAuthor=sc.replyToAuthor(1.0)={}, sc.replyToAuthorId=sc.replyToAuthorId(1.0)={}, sc.feedId=sc.feedId(1.0)={[124852]}, filterResult_124932_ti=filter Result_124932_ti(1.0)={67}, filterStatus_124932_s=filterStatus_124932_s(1.0)={COMPLETED}, filterResult_124937_ti=filterResult_124937_ti(1.0)={67}, filterStatus_124937_s =filterStatus_124937_s(1.0)={COMPLETED}, campaignDateAdded_124957_tdt=campaignDateAdded_124957_tdt(1.0)={Tue Dec 07 16:20:20 EST 2010}, campaignStatus_124957_s=campaign Status_124957_s(1.0)={NEW}, campaignDateAdded_124947_tdt=campaignDateAdded_124947_tdt(1.0)={Tue Dec 07 16:20:20 EST 2010}, campaignStatus_124947_s=campaignStatus_124947 _s(1.0)={NEW}, sc.campaignResultsSummary=sc.campaignResultsSummary(1.0)={[NEW, NEW]}}] at org.apache.solr.handler.BinaryUpdateRequestHandler$2.document(BinaryUpdateRequestHandler.java:81) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:136) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readIterator(JavaBinUpdateRequestCodec.java:126) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:210) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$2.readNamedList(JavaBinUpdateRequestCodec.java:112) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:175) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:101) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:141) at org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:68) at org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:46) at org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:55) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Thanks a lot for the response. Unfortunately I can't check the statistics page. For some reason the solr webapp itself is only returning a directory listing. This is sometimes fixed when I restart but if I do that I'll lose the state I have now. I can get at the JMX interface. Can I check my insanity level from there? We did change two parts of the solr config to raise the size of the query Results and document cache. I assume from what you were saying that this does not have an effect on the cache I mentioned taking up all of the space. This problem gets worse as our index grows (5.0GB now). Unfortunately we are maxed out on memory for our hardware. We aren't using faceting at all in our searches right now. We usually sort on 1 or 2 fields at the most. I think the types of our fields are pretty accurate, unfortunately they are mostly strings, and some dates. How do the field definitions effect that cache? Is it simply that fewer fields mean less to cache? Does it not cache some fields configured in a certain way? Is there a way to throw out an IndexReader after a while and restart, just to restart the cache? Or maybe explicitly clear it if we see it getting out of hand through JMX or something? Really anything to stop it from choking like this would be awesome. Thanks again. John On Fri, Dec 10, 2010 at 16:02, Tom Hill wrote: > Hi John, > > WeakReferences allow things to get GC'd, if there are no other > references to the object referred to. > > My understanding is that WeakHashMaps use weak references for the Keys > in the HashMap. > > What this means is that the keys in HashMap can be GC'd, once there > are no other references to the key. I _think_ this occurs when the > IndexReader is closed. > > It does not mean that objects in the FieldCache will get evicted in > low memory conditions, unless that field cache entry is no longer > needed (i.e. the IndexReader has closed). It just means they can be > collected, when they are no longer needed (but not before). > > So, if you are seeing the FieldCache for the current IndexReader > taking up 2.1, that's probably for the current cache usage. > > There isn't a "knob" you can turn to cut the cache size, but you can > evaluate your usage of the cache. Some ideas: > > How many fields are you searching on? Sorting on? Are you sorting on > String fields, where you could be using a numeric field? Numerics save > space. Do you need to sort on every field that you are sorting on? > Could you facet on fewer fields? For a String field, do you have too > many distinct values? If so, can you reduce the number or unique > terms? You might check your faceting algorithms, and see if you could > use enum, instead of fc for some of them. > > Check your statistics page, what's your insanity count? > > Tom > > > > On Fri, Dec 10, 2010 at 12:17 PM, John Russell > wrote: > > I have been load testing solr 1.4.1 and have been running into OOM > errors. > > Not out of heap but with the GC overhead limit exceeded message meaning > that > > it didn't actually run out of heap space but just spent too much CPU time > > trying to make room and gave up. > > > > I got a heap dump and sent it through the Eclipse MAT and found that a > > single WeakHashMap in FieldCacheImpl called readerCache is taking up > 2.1GB > > of my 2.6GB heap. > > > > From my understanding of WeakHashMaps the GC should be able to collect > those > > references if it needs to but for some reason it isn't here. > > > > My questions are: > > > > 1) Any ideas why the GC is not collecting those weak references in that > > single hashmap? > > 2) Is there a nob in the solr config that can limit the size of that > cache? > > > > > > Also, after the OOM is thrown solr doesn't respond much at all and throws > > the exception below, however when I go to the code I see this > > > > try { > > processor.processAdd(addCmd); > > addCmd.clear(); > >} catch (IOException e) { > > throw new > > SolrException(SolrException.ErrorCode.SERVER_ERROR, "ERROR adding > document " > > + document); > >} > > } > > > > So its swallowing the IOException and throwing a new one without setting > > the cause so I can't see what the IOException is. Is this fixed in any > > newer version? Should I open a bug? > > > > > > Thanks a lot for your help > > > > John > > > > > > SEVERE: org.apache.solr.common.SolrException: ERROR adding document > > SolrIn
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Thanks for the response. The date types are defined in our schema file like this Which appears to be what you mentioned. Then we use them in fields like this So I think we have the right datatypes for the dates. Most of the other ones are strings. As for the doc we are adding, I don't think it would be considered "huge". It is basically blog posts and tweets broken out into fields like author, title, summary etc. Each doc probably isn't more than 1 or 2k tops. Some probably smaller. We do create them once and then update the indexes as we perform work on the documents. For example, we create the doc for the original incoming post and then update that doc with tags or the results of filtering so we can look for them later. We have solr set up as a separate JVM which we talk to over HTTP on the same box using the solrj client java library. Unfortunately we are on 32 bit hardware so solr can only get 2.6GB of memory. Any more than that and the JVM won't start. I really just need a way to keep the cache from breaking the bank. As I pasted below there are some config elements in the XML that appear to be related to caching but I'm not sure that they are related to that specific hashmap which eventually grows to 2.1GB of our 2.6GB heap. It never actually runs out of heap space but GC's the CPU to death. Thanks again. John On Sat, Dec 11, 2010 at 17:46, Erick Erickson wrote: > "unfortunately I can't check the statistics page. For some reason the solr > webapp itself is only returning a directory listing." > > This is very weird and makes me wonder if there's something really wonky > with your system. I'm assuming when you say "the solr webapp itself" you're > taking about ...localhost:8983/solr/admin/.. You might want to be > looking > at the stats page (and frantically hitting refresh) before you have > problems. > Alternately, you could record the queries as they are sent to solr to see > what > the offending > > But onwards Tell us more about your dates. One of the very common > ways people get into trouble is to use dates that are unix-style > timestamps, > i.e. in milliseconds (either as ints or strings) and sort on them. Trie > fields > are very much preferred for this. > > Your index isn't all that large by regular standards, so I think that > there's > hope that you can get this working. > > > Wait, wait, wait. Looking again at the stack trace I see that your OOM > is happening when you *add* a document. Tell us more about the > document, perhaps you can print out some characteristics of the doc > before you add it? Is it always the same doc? Are you indexing and > searching on the same machine? Is the doc really huge? > > Best > Erick > > > On Fri, Dec 10, 2010 at 4:33 PM, John Russell wrote: > > > Thanks a lot for the response. > > > > Unfortunately I can't check the statistics page. For some reason the > solr > > webapp itself is only returning a directory listing. This is sometimes > > fixed when I restart but if I do that I'll lose the state I have now. I > > can > > get at the JMX interface. Can I check my insanity level from there? > > > > We did change two parts of the solr config to raise the size of the query > > Results and document cache. I assume from what you were saying that this > > does not have an effect on the cache I mentioned taking up all of the > > space. > > > >> > > class=*"solr.LRUCache"* > > > > size=*"16384"* > > > > initialSize=*"4096"* > > > > autowarmCount=*"0"*/> > > > > > > > > > > > > > > > class=*"solr.LRUCache"* > > > > size=*"16384"* > > > > initialSize=*"16384"* > > > > autowarmCount=*"0"*/> > > > > > > This problem gets worse as our index grows (5.0GB now). Unfortunately we > > are maxed out on memory for our hardware. > > > > We aren't using faceting at all in our searches right now. We usually > sort > > on 1 or 2 fields at the most. I think the types of our fields are pretty > > accurate, unfortunately they are mostly strings, and some dates. > > > > How do the field definitions effect that cache? Is it simply that fewer > > fields mean less to cache? Does it not cache some fields configured in a > > certain way? > > > > Is there a way to throw out an IndexReader after a while and restart, > just > > to restart the cache? Or maybe explicitly clear it if we
Re: OutOfMemory GC: GC overhead limit exceeded - Why isn't WeakHashMap getting collected?
Wow, you read my mind. We are committing very frequently. We are trying to get as close to realtime access to the stuff we put in as possible. Our current commit time is... ahem every 4 seconds. Is that insane? I'll try the ConcMarkSweep as well and see if that helps. On Mon, Dec 13, 2010 at 17:38, Jonathan Rochkind wrote: > Forgive me if I've said this in this thread already, but I'm beginning to > think this is the main 'mysterious' cause of Solr RAM/gc issues. > > Are you committing very frequently? So frequently that you commit faster > than it takes for warming operations on a new Solr index to complete, and > you're getting over-lapping indexes being prepared? > > But if the problem really is just GC issues and not actually too much RAM > being used, try this JVM setting: > > -XX:+UseConcMarkSweepGC > > Will make GC happen in a different thread, instead of the same thread as > solr operations. > > I think that is also something that many many Solr installations probably > need, but don't realize they need. > > > On 12/13/2010 3:42 PM, John Russell wrote: > >> Thanks for the response. >> >> The date types are defined in our schema file like this >> >> > precisionStep="0" positionIncrementGap="0"/> >> >> >> > precisionStep="6" positionIncrementGap="0"/> >> >> Which appears to be what you mentioned. Then we use them in fields like >> this >> >>> stored="false" >> required="false" multiValued="false" /> >>> required="false" multiValued="false" /> >> >> So I think we have the right datatypes for the dates. Most of the other >> ones are strings. >> >> As for the doc we are adding, I don't think it would be considered "huge". >> It is basically blog posts and tweets broken out into fields like author, >> title, summary etc. Each doc probably isn't more than 1 or 2k tops. Some >> probably smaller. >> >> We do create them once and then update the indexes as we perform work on >> the >> documents. For example, we create the doc for the original incoming post >> and then update that doc with tags or the results of filtering so we can >> look for them later. >> >> We have solr set up as a separate JVM which we talk to over HTTP on the >> same >> box using the solrj client java library. Unfortunately we are on 32 bit >> hardware so solr can only get 2.6GB of memory. Any more than that and the >> JVM won't start. >> >> I really just need a way to keep the cache from breaking the bank. As I >> pasted below there are some config elements in the XML that appear to be >> related to caching but I'm not sure that they are related to that specific >> hashmap which eventually grows to 2.1GB of our 2.6GB heap. It never >> actually runs out of heap space but GC's the CPU to death. >> >> Thanks again. >> >> John >> >> On Sat, Dec 11, 2010 at 17:46, Erick Erickson> >wrote: >> >> "unfortunately I can't check the statistics page. For some reason the >>> solr >>> webapp itself is only returning a directory listing." >>> >>> This is very weird and makes me wonder if there's something really wonky >>> with your system. I'm assuming when you say "the solr webapp itself" >>> you're >>> taking about ...localhost:8983/solr/admin/.. You might want to be >>> looking >>> at the stats page (and frantically hitting refresh) before you have >>> problems. >>> Alternately, you could record the queries as they are sent to solr to see >>> what >>> the offending >>> >>> But onwards Tell us more about your dates. One of the very common >>> ways people get into trouble is to use dates that are unix-style >>> timestamps, >>> i.e. in milliseconds (either as ints or strings) and sort on them. Trie >>> fields >>> are very much preferred for this. >>> >>> Your index isn't all that large by regular standards, so I think that >>> there's >>> hope that you can get this working. >>> >>> >>> Wait, wait, wait. Looking again at the stack trace I see that your OOM >>> is happening when you *add* a document. Tell us more about the >>> document, perhaps you can print out some characteristics of the doc >>> before you add it? Is it always the