Re: Solr Trunk Heap Space Issues

Yonik Seeley Thu, 01 Oct 2009 13:35:47 -0700

On Thu, Oct 1, 2009 at 4:05 PM, Yonik Seeley <yo...@lucidimagination.com> wrote:
> On Thu, Oct 1, 2009 at 3:37 PM, Mark Miller <markrmil...@gmail.com> wrote:
>> Still interested in seeing his field sanity output to see whats possibly
>> being doubled.
>
> Strangely enough, I'm having a hard time seeing caching at the different 
> levels.
> I mad a multi-segment index (2 segments), and then did a sort and facet:
> http://localhost:8983/solr/select?q=*:*&sort=popularity%20desc&facet=true&facet.field=popularity
>
> Seems like that should do it, but the statistics fieldCache section
> shows only 2 entries.
>  entries_count : 2
> entry#0 : 
> 'org.apache.lucene.index.compoundfilereader$csindexin...@5b38d7'=>'popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>[I#949587
> (size =~ 92 bytes)
> entry#1 : 
> 'org.apache.lucene.index.compoundfilereader$csindexin...@1582a7c'=>'popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>[I#3534544
> (size =~ 28 bytes)
> insanity_count : 0
>
> Investigating further...


Ahhh, TrieField.isTokenized() returns true.
The facet code has
    boolean multiToken = sf.multiValued() || ft.isTokenized();
and if multiToken==true then it uses multi-valued faceting, which
doesn't use the field cache.

Since isTokenized() more reflects if something is tokenized at the
Lucene level, perhaps we need something that specifies if there is
more than one logical value per field value?  I'm drawing a blank on a
good name for such a method though...

-Yonik
http://www.lucidimagination.com



> -Yonik
> http://www.lucidimagination.com
>
>> Yonik Seeley wrote:
>>> On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller <markrmil...@gmail.com> wrote:
>>>
>>>> bq. Tons of changes since... including the per-segment
>>>> searching/sorting/function queries (I think).
>>>>
>>>> Yup. I actually didn't think so, because that was committed to Lucene in
>>>> Feburary - but it didn't come into Solr till March 10th. March 5th just
>>>> ducked it.
>>>>
>>>
>>> Jeff said May 5th
>>>
>>> But it wasn't until the end of May that Solr started using Lucene's
>>> new sorting facilities that worked per-segment.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>> Yonik Seeley wrote:
>>>>
>>>>> On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn <jnewb...@zappos.com> wrote:
>>>>>
>>>>>
>>>>>> I am trying to update to the newest version of solr from trunk as of May
>>>>>> 5th.
>>>>>>
>>>>>>
>>>>> Tons of changes since... including the per-segment
>>>>> searching/sorting/function queries (I think).
>>>>>
>>>>> Do you sort on any single valued fields that you also facet on?
>>>>> Do you use ord() or rord() in any function queries?
>>>>>
>>>>> Unfortunately, some of these things will take up more memory because
>>>>> some things still cache FieldCache elements with the top-level reader,
>>>>> while some use segment readers.  The direction is going toward all
>>>>> segment readers, but we're not there yet (and won't be for 1.4).
>>>>> ord() rord() will never be fixed... people need to migrate to
>>>>> something else.
>>>>>
>>>>> http://issues.apache.org/jira/browse/SOLR-1111 is the main issue for this.
>>>>>
>>>>> If course, I've really only been talking about search related changes.
>>>>>  Nothing on the indexing side should cause greater memory usage....
>>>>> but perhaps the indexing side could run out of memory due to the
>>>>> search side taking up more.
>>>>>
>>>>> -Yonik
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>>  I updated and compiled from trunk as of yesterday (09/30/2009).  When
>>>>>> I try to do a full import I am receiving a GC heap error after changing
>>>>>> nothing in the configuration files.  Why would this happen in the most
>>>>>> recent versions but not in the version from a few months ago.  The stack
>>>>>> trace is below.
>>>>>>
>>>>>> Oct 1, 2009 8:34:32 AM 
>>>>>> org.apache.solr.update.processor.LogUpdateProcessor
>>>>>> finish
>>>>>> INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 
>>>>>> 167353,
>>>>>> ...(83 more)]} 0 35991
>>>>>> Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
>>>>>> SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>>    at java.util.Arrays.copyOfRange(Arrays.java:3209)
>>>>>>    at java.lang.String.<init>(String.java:215)
>>>>>>    at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
>>>>>>    at 
>>>>>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
>>>>>>    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
>>>>>>    at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
>>>>>>    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>>>>>>    at
>>>>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
>>>>>> reamHandlerBase.java:54)
>>>>>>    at
>>>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
>>>>>> java:131)
>>>>>>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>>>>>    at
>>>>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
>>>>>> 38)
>>>>>>    at
>>>>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
>>>>>> 241)
>>>>>>    at
>>>>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
>>>>>> FilterChain.java:235)
>>>>>>    at
>>>>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
>>>>>> ain.java:206)
>>>>>>    at
>>>>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
>>>>>> va:233)
>>>>>>    at
>>>>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
>>>>>> va:175)
>>>>>>    at
>>>>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
>>>>>> )
>>>>>>    at
>>>>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
>>>>>> )
>>>>>>    at
>>>>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
>>>>>> :109)
>>>>>>    at
>>>>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>>>>>    at
>>>>>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
>>>>>> 879)
>>>>>>    at
>>>>>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
>>>>>> ttp11NioProtocol.java:719)
>>>>>>    at
>>>>>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
>>>>>> 2080)
>>>>>>    at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
>>>>>> va:886)
>>>>>>    at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
>>>>>> 08)
>>>>>>    at java.lang.Thread.run(Thread.java:619)
>>>>>>
>>>>>> Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
>>>>>> INFO: [zeta-main] webapp=/solr path=/update params={} status=500 
>>>>>> QTime=5265
>>>>>> Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
>>>>>> SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>>
>>>>>> --
>>>>>> Jeff Newburn
>>>>>> Software Engineer, Zappos.com
>>>>>> jnewb...@zappos.com - 702-943-7562
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> --
>>>> - Mark
>>>>
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>

Re: Solr Trunk Heap Space Issues

Reply via email to