I do not think it is a problem of reporting after watching top after restart of 
some Solr instances, it dropped back to `normal`, around 350 MB, which i think 
it high to but anyway.

Two hours later, the restarted nodes are slowly increasing shared memory 
consumption to about 1500 MB now. I don't understand why shared memory usage 
should/would increase slowly over time, it makes little sense to me and i 
cannot remember Solr doing this in the past ten years.

But it seems to correlate to index size on disk, these main text search nodes 
have an index of around 16 GB and up 3 GB of shared memory after a few days. 
Logs nodes up to 800 MB index size and 320 MB of shared memory, the low latency 
nodes have four different cores that make up just over 100 MB index size, 
shared memory consumption is just 22 MB, which seems more reasonable for the 
case of shared memory.

I can also force Solr to 'leak' shared memory just by sending queries to it. My 
freshly restarted local node used 68 MB shared memory at startup. Two minutes 
and 25.000 queries later it was already 2748 MB! At first there is a very sharp 
increase to 2000, then it takes almost two minutes more to increase to 2748. I 
can decrease the maximum shared memory usage to 1200 if i query (via edismax) 
only on fields of one language instead of 25 orso. I can decrease it as well 
further if i disable highlighting (HUH?) but still query on all fields.

* We have tried patching Java's ByteBuffer [1] because it seemed to fit the 
problems, it does not fix it.
* We have also removed all our custom plugins, so it has become a vanilla Solr 
6.6 just with our stripped down schema and solrconfig, it neither fixes it.

Why does it slowly increase over time?
Why does it appear to correlate to index size?
Is anyone else seeing this on their 6.6 cloud production or local machines?

Thanks,
Markus
 
[1]: http://www.evanjones.ca/java-bytebuffer-leak.html

-----Original message-----
> From:Shawn Heisey <apa...@elyograg.org>
> Sent: Tuesday 22nd August 2017 17:32
> To: solr-user@lucene.apache.org
> Subject: Re: Solr uses lots of shared memory!
> 
> On 8/22/2017 7:24 AM, Markus Jelsma wrote:
> > I have never seen this before, one of our collections, all nodes eating 
> > tons of shared memory!
> >
> > Here's one of the nodes:
> > 10497 solr      20   0 19.439g 4.505g 3.139g S   1.0 57.8   2511:46 java 
> >
> > RSS is roughly equal to heap size + usual off-heap space + shared memory. 
> > Virtual is equal to RSS and index size on disk. For two other collections, 
> > the nodes use shared memory as expected, in the MB range.
> >
> > How can Solr, this collection, use so much shared memory? Why?
> 
> I've seen this on my own servers at work, and when I add up a subset of
> the memory numbers I can see from the system, it ends up being more
> memory than I even have in the server.
> 
> I suspect there is something odd going on in how Java reports memory
> usage to the OS, or maybe a glitch in how Linux interprets Java's memory
> usage.  At some point in the past, numbers were reported correctly.  I
> do not know if the change came about because of a Solr upgrade, because
> of a Java upgrade, or because of an OS kernel upgrade.  All three were
> upgraded between when I know the numbers looked right and when I noticed
> they were wrong.
> 
> https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0
> 
> This screenshot shows that Solr is using 17GB of memory, 41.45GB of
> memory is being used by the OS disk cache, and 10.23GB of memory is
> free.  Add those up, and it comes to 68.68GB ... but the machine only
> has 64GB of memory, and that total doesn't include the memory usage of
> the other processes seen in the screenshot.  This impossible situation
> means that something is being misreported somewhere.  If I deduct that
> 11GB of SHR from the RES value, then all the numbers work.
> 
> The screenshot was almost 3 years ago, so I do not know what machine it
> came from, and therefore I can't be sure what the actual heap size was. 
> I think it was about 6GB -- the difference between RES and SHR.  I have
> used a 6GB heap on some of my production servers in the past.  The
> server where I got this screenshot was not having any noticeable
> performance or memory problems, so I think that I can trust that the
> main numbers above the process list (which only come from the OS) are
> correct.
> 
> Thanks,
> Shawn
> 
> 

Reply via email to