Hello,

We just finished upgrading our three separate clusters from 7.2.1 to 7.3, which 
went fine, except for our main text search collection, it appears to leak 
memory on commit!

After initial upgrade we saw the cluster slowly starting to run out of memory 
within about an hour and a half. We increased heap in case 7.3 just requires 
more of it, but the heap consumption graph is still growing on each commit. 
Heap space cannot be reclaimed by forcing the garbage collector to run, 
everything just piles up in the OldGen. Running with this slightly larger heap, 
the first nodes will run out of memory in about two and a half hours after 
cluster restart.

The heap eating cluster is a 2shard/3replica system on separate nodes. Each 
replica is about 50 GB in size and about 8.5 million documents. On 7.2.1 it ran 
fine with just a 2 GB heap. With 7.3 and 2.5 GB heap, it will take just a 
little longer for it to run out of memory.

I inspected reports shown by the sampler of VisualVM and spotted one 
peculiarity, the number of instances of SortedIntDocSet kept growing on each 
commit by about the same amount as the number of cached filter queries. But 
this doesn't happen on the logs cluster, SortedIntDocSet instances are neatly 
collected there. The number of instances also accounts for the number of 
commits since start up times the cache sizes

Our other two clusters don't have this problem, one of them receives very few 
commits per day, but the other receives data all the time, it logs user 
interactions so a large amount of data is coming in all the time. I cannot 
reproduce it locally by indexing data and committing all the time, the peak 
usage in OldGen stays about the same. But, i can reproduce it locally when i 
introduce queries, and filter queries while indexing pieces of data and 
committing it.

So, what is the problem? I dug in the CHANGES.txt of both Lucene and Solr, but 
nothing really caught my attention. Does anyone here have an idea where to look?

Many thanks,
Markus

Reply via email to