On 8/22/2017 7:24 AM, Markus Jelsma wrote: > I have never seen this before, one of our collections, all nodes eating tons > of shared memory! > > Here's one of the nodes: > 10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java > > RSS is roughly equal to heap size + usual off-heap space + shared memory. > Virtual is equal to RSS and index size on disk. For two other collections, > the nodes use shared memory as expected, in the MB range. > > How can Solr, this collection, use so much shared memory? Why?
I've seen this on my own servers at work, and when I add up a subset of the memory numbers I can see from the system, it ends up being more memory than I even have in the server. I suspect there is something odd going on in how Java reports memory usage to the OS, or maybe a glitch in how Linux interprets Java's memory usage. At some point in the past, numbers were reported correctly. I do not know if the change came about because of a Solr upgrade, because of a Java upgrade, or because of an OS kernel upgrade. All three were upgraded between when I know the numbers looked right and when I noticed they were wrong. https://www.dropbox.com/s/91uqlrnfghr2heo/solr-memory-sorted-top.png?dl=0 This screenshot shows that Solr is using 17GB of memory, 41.45GB of memory is being used by the OS disk cache, and 10.23GB of memory is free. Add those up, and it comes to 68.68GB ... but the machine only has 64GB of memory, and that total doesn't include the memory usage of the other processes seen in the screenshot. This impossible situation means that something is being misreported somewhere. If I deduct that 11GB of SHR from the RES value, then all the numbers work. The screenshot was almost 3 years ago, so I do not know what machine it came from, and therefore I can't be sure what the actual heap size was. I think it was about 6GB -- the difference between RES and SHR. I have used a 6GB heap on some of my production servers in the past. The server where I got this screenshot was not having any noticeable performance or memory problems, so I think that I can trust that the main numbers above the process list (which only come from the OS) are correct. Thanks, Shawn