On 2/7/2018 5:20 AM, Maulin Rathod wrote:
Further analyzing issue we found that asking for too many rows (e.g.
rows=10000000) can cause full GC problem as mentioned in below link.
This is because when you ask for 10 million rows, Solr allocates a
memory structure capable of storing information for each of those 10
million rows, even before it knows how many documents are actually going
to match the query. This problem is mentioned by Toke's blog post you
linked.
Bare wildcard queries can also lead to big problems with memory churn,
and are not recommended. Your query has a bare "*" included in it
FOURTEEN times, on the summary field. The name of that field suggests
that it will have a very high term count. If it does have a lot of
unique terms, then ONE wildcard is going to be horrifically slow and
consume a ton of memory. Fourteen of them is going to be particularly
insane. You've also got a number of wildcards with text prefixes, which
will not be as bad as the bare wildcard, but can still chew up a lot of
memory and time.
I suspect that entire "summary" part of your query generation needs to
be reworked.
You also have wildcards in the part of the query on the "title" field.
The kind of query you do with wildcards can often be completely replaced
with ngram or edgengram filtering on the analysis chain, usually with a
big performance advantage.
I suspect that the large number of wildcards is a big part of why your
example query took 83 seconds to execute. There may have also been some
nasty GC pauses during the query.
You still have not answered the questions asked early in this thread
about memory. Is the heap 40GB, or is that the total memory installed
in the server? What is the total size of all Solr heaps on the machine,
how much total memory is in the server, and how much index data (both
document count and disk space size) is being handled by all the Solr
instances on that machine?
The portion of your GC log that you included is too short, and has also
been completely mangled by being pasted into an email. If you want it
analyzed, we will need a full copy of the logfile, without any
modification, which likely means you need to use a file sharing site to
transport it.
What I *can* decipher from your GC log suggests that your heap size may
actually be 48GB, not 40GB. After the big GC event, there was a little
over 17GB of heap memory allocations remaining. So my first bit of
advice is to try reducing the heap size. Without a large GC log, my
current thought is to make it half what it currently is -- 24GB. With a
more extensive GC log, I could make a more accurate recommendation. My
second bit of advice would be to eliminate as many wildcards from your
query as you can. If your queries are producing the correct results,
then I will tell you that the "summary" part of your query example is
quite possibly completely unnecessary, and is going to require a LOT of
memory.
Additional advice, not really related to the main discussion:
Some of the query looks like it is a perfect candidate for extraction
into filter queries. Any portion of the query that is particularly
static is probably going to benefit from being changed into a filter
query. Possible filters you could use based on what I see:
fq=isFolderActive:true
fq=isXref:false
fq=*:* -document_type_id:(3 7)
If your index activity is well-suited for heavy filterCache usage,
filters like this can achieve incredible speedups.
A lot of the other things in the query appear to be for ID values that
are likely to change for every user. Query clauses like that are not a
good fit for filter queries.
Thanks,
Shawn