Re: Long string in fq value parameter, more than 2000000 chars

Shawn Heisey Sat, 27 May 2017 08:05:51 -0700

On 5/27/2017 7:14 AM, Daniel Angelov wrote:
> I would like to ask, what could be the memory/cpu impact, if the fq
> parameter in many of the queries is a long string (fq={!terms
> f=...}...,.... ) around 2000000 chars. Most of the queries are like:
> "q={!frange l=Timestamp1 u=Timestamp2}... + some others criteria". This is
> with SolrCloud 4.1, on 10 hosts, 3 collections, summary in all collections
> are around 10000000 docs. The queries are over all 3 collections.
>
> I have sometimes OOM exceptions. And I can see GC times are pretty long.
> The heap size is 64 GB on each host. The cache settings are the default.
>
> Is it possible the long fq parameter in the requests to cause OOM
> exceptions?


A two million character string in Java will take just over four million
bytes of memory.  This is because Java uses UTF-16 internally, and
overhead on a String object is approximately 56 bytes.  With multiple
shards, that string is going to get copied for each shard.  There might
be other places in the Solr and Lucene code where the string will also
get copied multiple times.  At four megabytes for each copy, that's
going to eat up memory quickly.  It will also take a non-trivial amount
of time to accomplish each copy.

OOM exceptions on a 64GB heap?  Even if we consider the info just
mentioned and there are several copies of the two million character
string floating around, it sounds like you are doing some massively
complex queries, or that your index size is beyond gargantuan.  I cannot
imagine needing a 64GB heap for 30 million documents unless the system
is handling some very unusual queries, and/or an enormous index, and/or
some *extremely* large Solr caches.

I suspect there are many details that we haven't heard yet.  I'm not
even sure exactly what to ask for, so I'll ask for the moon:

On a per-server basis, can we see the following info?

Total memory installed in the server.
How many Solr instances are running on the server.
The total amount of max heap memory allocated to Solr.
A list of other things running on the server besides Solr.
Total size of the solr home directory.
How many documents does that solr home size represent? If there are
multiple shards/replicas, all of them must be counted.
solrconfig.xml and the schema would be useful.

More general questions:

What does a typical query involve?
If there are facets, describe each field used in a facet -- term
cardinality, typical contents, analysis, etc.

If the system is running an OS with the "top" utility available, run top
(not htop or any other variety), press shift-M to sort by memory, grab a
screenshot, and put the information on the Internet somewhere we can
access it with a URL.  If it's on Windows, similar information can be
obtained with Resource Monitor, sort by "Working Set" on the Memory tab.

Thanks,
Shawn

Re: Long string in fq value parameter, more than 2000000 chars

Reply via email to