I’d add that you’re abusing Solr horribly by returning 300K documents in a 
single go.

Solr is built to return the top N docs where N is usually quite small, < 100. 
If you allow
an unlimited number of docs to be returned, you’re simply kicking the can down
the road, somebody will ask for 1,000,000 docs sometime and you’ll be back where
you started.

I _strongly_ recommend you do one of two things for such large result sets:

1> Use Streaming. Perhaps Streaming Expressions will do what you want
    without you having to process all those docs on the client if you’re 
    doing some kind of analytics.

2> if you really, truly need all 300K docs, try getting them in chunks
     using CursorMark.

Best,
Erick

> On Jul 13, 2020, at 10:03 PM, Odysci <ody...@gmail.com> wrote:
> 
> Shawn,
> 
> thanks for the extra info.
> The OOM errors were indeed because of heap space. In my case most of the GC
> calls were not full GC. Only when heap was really near the top, a full GC
> was done.
> I'll try out your suggestion of increasing the G1 heap region size. I've
> been using 4m, and from what you said, a 2m allocation would be considered
> humongous. My test cases have a few allocations that are definitely bigger
> than 2m (estimating based on the number of docs returned), but most of them
> are not.
> 
> When i was using maxRamMB, the size used was "compatible" with the the size
> values, assuming the avg 2K bytes docs that our index has.
> As far as I could tell in my runs, removing maxRamMB did change the GC
> behavior for the better. That is, now, heap goes up and down as expected,
> and before (with maxRamMB) it seemed to increase continuously.
> Thanks
> 
> Reinaldo
> 
> On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <apa...@elyograg.org> wrote:
> 
>> On 6/25/2020 2:08 PM, Odysci wrote:
>>> I have a solrcloud setup with 12GB heap and I've been trying to optimize
>> it
>>> to avoid OOM errors. My index has about 30million docs and about 80GB
>>> total, 2 shards, 2 replicas.
>> 
>> Have you seen the full OutOfMemoryError exception text?  OOME can be
>> caused by problems that are not actually memory-related.  Unless the
>> error specifically mentions "heap space" we might be chasing the wrong
>> thing here.
>> 
>>> When the queries return a smallish number of docs (say, below 1000), the
>>> heap behavior seems "normal". Monitoring the gc log I see that young
>>> generation grows then when GC kicks in, it goes considerably down. And
>> the
>>> old generation grows just a bit.
>>> 
>>> However, at some point i have a query that returns over 300K docs (for a
>>> total size of approximately 1GB). At this very point the OLD generation
>>> size grows (almost by 2GB), and it remains high for all remaining time.
>>> Even as new queries are executed, the OLD generation size does not go
>> down,
>>> despite multiple GC calls done afterwards.
>> 
>> Assuming the OOME exceptions were indeed caused by running out of heap,
>> then the following paragraphs will apply:
>> 
>> G1 has this concept called "humongous allocations".  In order to reach
>> this designation, a memory allocation must get to half of the G1 heap
>> region size.  You have set this to 4 megabytes, so any allocation of 2
>> megabytes or larger is humongous.  Humongous allocations bypass the new
>> generation entirely and go directly into the old generation.  The max
>> value that can be set for the G1 region size is 32MB.  If you increase
>> the region size and the behavior changes, then humongous allocations
>> could be something to investigate.
>> 
>> In the versions of Java that I have used, humongous allocations can only
>> be reclaimed as garbage by a full GC.  I do not know if Oracle has
>> changed this so the smaller collections will do it or not.
>> 
>> Were any of those multiple GCs a Full GC?  If they were, then there is
>> probably little or no garbage to collect.  You've gotten a reply from
>> "Zisis T." with some possible causes for this.  I do not have anything
>> to add.
>> 
>> I did not know about any problems with maxRamMB ... but if I were
>> attempting to limit cache sizes, I would do so by the size values, not a
>> specific RAM size.  The size values you have chosen (8192 and 16384)
>> will most likely result in a total cache size well beyond the limits
>> you've indicated with maxRamMB.  So if there are any bugs in the code
>> with the maxRamMB parameter, you might end up using a LOT of memory that
>> you didn't expect to be using.
>> 
>> Thanks,
>> Shawn
>> 

Reply via email to