Re: Solr-8.1.0 uses much more memory

Joe Doupnik Mon, 27 May 2019 08:50:46 -0700

A few more numbers to contemplate. An experiment here, adding 80PDF and PPTX files into an empty index.

Solr v8.0 regular settings, 1.7GB quiesent memory consumption, 1.9GBwhile indexing, 2.92 minutes to do the job.Solr v8.0, using GC_TUNE from v8.1 solr.in.sh, 1.1GB quiesent, 1.3GBwhile indexing, 2.97 minutes.Solr v8.1, regular settings, 4.3GB quiesent, 4.4GB while indexing, 1.67minutesSolr v8.1, using GC_TUNE from v8.1 solr.in.sh, 1.0GB quiesent, 1.3GBwhile indexing, 1.53 minutes

It is clear that the GC_TUNE settings from v8.1 are beneficial tov8.0, saving about 600MB of memory. That's not small change. Also clear is that Solr v8.1 is slightly faster than v8.0 when bothuse those TUNE values. A hidden benefit. Without GC_TUNE settings Solr v8.1 shows its appetite for muchmemory, several GB's more than v8.0.

Because those TUNE settings can make an improvment to Solr v8.0 itwould be beneficial to have the documentation discuss that usage.Meanwhile, the memory consumption problem remains as discussed.

On the overfeeding part of things. The classical approach ispipeline the work and between each stage have a go/stop sign to throttletraffic (a road crossing lollypop lady, if you like). Such signs couldbe set when a regional thread consumption is reached, or similarresource limit encountered. This permits one stage to stop listeningwhile the work continues within it and many other stages, and then thesign changes to go and the regional flow resumes. We see this in commonroad/people traffic situations etc every day. It's nicely asynchronousand does not need a complicated (nor any) master controller. The key ishave limits based on sound engineering criteria, and yes, that mightmean having a few sets of them for different operating situations andthe customer chooses appropriately.

    Thanks,
    Joe D.

On 27/05/2019 11:05, Joe Doupnik wrote:

You are certainly correct about using external load balancers whenappropriate. However, a basic problem with servers, that of acceptingmore incoming items than can be handled gracefully is as we know anage-old one and solved by back pressure methods (particularly hardlimits). My experience with Solr suggests that parts (say Tika) arebeing too nice to incoming material, letting too many items enter theapplication, consume resources, and so forth which then become awkwardto handle (see the locks item discussion cited earlier). Entry oughtto be blocked until the processing structure declares that resourcesare available to accept new entries (a full but not overfullpipeline). Those internal issues, locks, memory and similar, areresolvable when limits are imposed. Also, with limits then yourmentioned load balancers stand a chance of sensing when a particularserver is currently not accepting new requests. Establishing limitsdoes take some creative thinking about how the system as a whole isconstructed. I brought up the overload case because it pertains to this mainmemory management thread.
    Thanks,
    Joe D.

On 27/05/2019 10:21, Bernd Fehling wrote:
I think it is not fair blaiming Solr not also having a load balancer.
It is up to you and your needs to set up the required infrastucture
including load balancing. The are many products available on the market.
If your current system can't handle all requests then install morereplicas.
Regards
Bernd

Am 27.05.19 um 10:33 schrieb Joe Doupnik:
While on the topic of resource consumption and locks etc, thereis one other aspect to which Solr has been vulnerable. It is failingto fend off too many requests at one time. The standard approach is,of course, named back pressure, such as not replying to a queryuntil resources permit and thus keeping competion outside of theapplication. That limits resource consumption, including locks,memory and sundry, while permiting normal work within to progresssmoothly. Let the crowds coming to a hit show queue in the rainoutside the theatre until empty seats become available.
On 27/05/2019 08:52, Joe Doupnik wrote:
Generalizations tend to fail when confronted with conflictingevidence. The simple evidence is asking how much real memory theSolr owned process has been allocated (top, or ps aux or similar)and that yields two very different values (the ~1.6GB of Solr v8.0and 4.5+GB of Solr v8.1). I have no knowledge of how Java choosesto name its usage (heap or otherwise). Prior to v8.1 Solr memoryconsumption varied with activity, thus memory management wasoccuring, memory was borrowed from and returned to the system. Whatmight be happening in Solr v8.1 is the new memory management codeis failing to do a proper job, for reasons which are not visible tous in the field, and that failure is important to us. In regard to the referenced lock discussion, it would be a goodidea to not let the tail wag the dog, tend the common cases andlive with a few corner case difficulties because perfection is notpossible.
    Thanks,
    Joe D.

On 26/05/2019 20:30, Shawn Heisey wrote:
On 5/26/2019 12:52 PM, Joe Doupnik wrote:
I do queries while indexing, have done so for a long time,without difficulty nor memory usage spikes from dual use. Thesystem has been designed to support that. Again, one may look at the numbers using "top" or similar.Try Solr v8.0 and 8.1 to see the difference which I experiencehere. For reference, the only memory adjustables set in myconfiguration is in the Solr startup script solr.in.sh saying add"-Xss1024k" in the SOLR_OPTS list and setting SOLR_HEAP="4024m".
There is one significant difference between 8.0 and 8.1 in therealm of memory management -- we have switched from the CMSgarbage collector to the G1 collector. So the way that Javamanages the heap has changed. This was done because the CMScollector is slated for removal from Java.
https://issues.apache.org/jira/browse/SOLR-13394
Java is unlike other programs in one respect -- once it allocatesheap from the OS, it never gives it back. This behavior has givenJava an undeserved reputation as a memory hog ... but in factJava's overall memory usage can be very easily limited ... anoption that many other programs do NOT have.
In your configuration, you set the max heap to a little less than4GB. You have to expect that it *WILL* use that memory. By usingthe SOLR_HEAP variable, you have instructed Solr's startup scriptto use the same setting for the minimum heap as well as themaximum heap. This is the design intent.
If you want to know how much heap is being used, you can't ask theoperating system, which means tools like top. You have to askJava. And you will have to look at a long-term graph, finding thelow points. An instananeous look at Java's heap usage could showyou that the whole heap is allocated ... but a significant part ofthat allocation could be garbage, which becomes available once thegarbage is collected.
Thanks,
Shawn

Re: Solr-8.1.0 uses much more memory

Reply via email to