Thanks for your response. See below please for detailed responses.

On Tue, Apr 23, 2019 at 6:04 PM Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/23/2019 6:34 AM, Brian Ecker wrote:
> > What I’m trying to determine is (1) How much heap does
> > this setup need before it stabilizes and stops crashing with OOM errors,
> > (2) can this requirement somehow be reduced so that we can use less
> > memory, and (3) from the heap histogram, what is actually using memory
> > (lots of primitive type arrays and data structures, but what part of
> > Solr is using those)?
>
> Exactly one attachment made it through:  The file named
> solrconfig-anonymized.xml.  Attachments can't be used to share files
> because the mailing list software is going to eat them and we won't see
> them.  You'll need to use a file sharing website.  Dropbox is often a
> good choice.
>

I see. The other files I meant to attach were the GC log (
https://pastebin.com/raw/qeuQwsyd), the heap histogram (
https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
http://oi64.tinypic.com/21r0bk.jpg).

>
> We won't be able to tell anything about what's using all the memory from
> a histogram.  We would need an actual heap dump from Java.  This file
> will be huge -- if you have a 10GB heap, and that heap is full, the file
> will likely be larger than 10GB.


I'll work on getting the heap dump, but would it also be sufficient to use
say a 5GB dump from when it's half full and then extrapolate to the
contents of the heap when it's full? That way the dump would be a bit
easier to work with.

>
> There is no way for us to know how much heap you need.  With a large
> amount of information about your setup, we can make a guess, but that
> guess will probably be wrong.  Info we'll need to make a start:
>

I believe I already provided most of this information in my original post,
as I understand that it's not trivial to make this assessment accurately.
I'll re-iterate below, but please see the original post too because I tried
to provide as much detail as possible.

>
> *) How many documents is this Solr instance handling?  You find this out
> by looking at every core and adding up all the "maxDoc" numbers.
>

There are around 2,100,000 documents.

>
> *) How much disk space is the index data taking?  This could be found
> either by getting a disk usage value for the solr home, or looking at
> every core and adding up the size of each one.
>

The data takes around 9GB on disk.

>
> *) What kind of queries are you running?  Anything with facets, or
> grouping?  Are you using a lot of sort fields?


No facets or grouping and no sort fields. The application performs a
full-text search complete-as-you-type function. Much of this is done using
prefix analyzers and edge ngrams. We also make heavy use of spellchecking.
An example of one of the queries produced is the following:

?q=(single_value_f1:"baril" OR multivalue_f1:"baril")^=1
(single_value_f2:(baril) OR multivalue_f2:(baril))^=0.5
&fl=score,myfield1,myfield2,myfield3:myfield3.ar&bf=product(def(myfield3.ar
,0),1)&rows=200&df=dummy&spellcheck=on&spellcheck.dictionary=spellchecker.es&spellcheck.dictionary=spellchecker.und&spellcheck.q=baril&spellcheck.accuracy=0.5&spellcheck.count=1&fq=+myfield1:(100
OR 200 OR 500)&fl=score&fl=myfield1&fl=myfield2&fl=myfield3:myfield3.ar


> *) What kind of data is in each document, and how large is that data?
>

The data contained is mostly 1-5 words of text in various fields and in
various languages. We apply different tokenizers and some language specific
analyzers for different fields, but almost every field is tokenized. There
are 215 fields in total, 77 of which are stored. Based on the index size on
disk and the number of documents, I guess that gives 4.32 KB/doc on
average.

>
> Your cache sizes are reasonable.  So you can't reduce heap requirements
> by much by reducing cache sizes.
>
> Here's some info about what takes a lot of heap and ideas for reducing
> the requirements:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap


Thank you, but I've seen that page already and that's part of why I'm
confused, as I believe most of those points that usually take a lot of heap
don't seem to apply to my setup.

>
>
> That page also reiterates what I said above:  It's unlikely that anybody
> will be able to tell you exactly how much heap you need at a minimum.
> We can make guesses, but those guesses might be wrong.
>
> Thanks,
> Shawn
>

Reply via email to