A few more numbers to contemplate. An experiment here, adding 80 PDF and PPTX files into an empty index.

Solr v8.0 regular settings, 1.7GB quiesent memory consumption, 1.9GB while indexing, 2.92 minutes to do the job. Solr v8.0, using GC_TUNE from v8.1 solr.in.sh, 1.1GB quiesent, 1.3GB while indexing,  2.97 minutes. Solr v8.1, regular settings, 4.3GB quiesent, 4.4GB while indexing, 1.67 minutes Solr v8.1, using GC_TUNE from v8.1 solr.in.sh, 1.0GB quiesent, 1.3GB while indexing, 1.53 minutes

    It is clear that the GC_TUNE settings from v8.1 are beneficial to v8.0, saving about 600MB of memory. That's not small change.     Also clear is that Solr v8.1 is slightly faster than v8.0 when both use those TUNE values. A hidden benefit.     Without GC_TUNE settings Solr v8.1 shows its appetite for much memory, several GB's more than v8.0.

    Because those TUNE settings can make an improvment to Solr v8.0 it would be beneficial to have the documentation discuss that usage. Meanwhile, the memory consumption problem remains as discussed.

    On the overfeeding part of things. The classical approach is pipeline the work and between each stage have a go/stop sign to throttle traffic (a road crossing lollypop lady, if you like). Such signs could be set when a regional thread consumption is reached, or similar resource limit encountered. This permits one stage to stop listening while the work continues within it and many other stages, and then the sign changes to go and the regional flow resumes. We see this in common road/people traffic situations etc every day. It's nicely asynchronous and does not need a complicated (nor any) master controller. The key is have limits based on sound engineering criteria, and yes, that might mean having a few sets of them for different operating situations and the customer chooses appropriately.
    Thanks,
    Joe D.

On 27/05/2019 11:05, Joe Doupnik wrote:
You are certainly correct about using external load balancers when appropriate. However, a basic problem with servers, that of accepting more incoming items than can be handled gracefully is as we know an age-old one and solved by back pressure methods (particularly hard limits). My experience with Solr suggests that parts (say Tika) are being too nice to incoming material, letting too many items enter the application, consume resources, and so forth which then become awkward to handle (see the locks item discussion cited earlier). Entry ought to be blocked until the processing structure declares that resources are available to accept new entries (a full but not overfull pipeline). Those internal issues, locks, memory and similar, are resolvable when limits are imposed. Also, with limits then your mentioned load balancers stand a chance of sensing when a particular server is currently not accepting new requests. Establishing limits does take some creative thinking about how the system as a whole is constructed.     I brought up the overload case because it pertains to this main memory management thread.
    Thanks,
    Joe D.

On 27/05/2019 10:21, Bernd Fehling wrote:
I think it is not fair blaiming Solr not also having a load balancer.
It is up to you and your needs to set up the required infrastucture
including load balancing. The are many products available on the market.
If your current system can't handle all requests then install more replicas.

Regards
Bernd

Am 27.05.19 um 10:33 schrieb Joe Doupnik:
     While on the topic of resource consumption and locks etc, there is one other aspect to which Solr has been vulnerable. It is failing to fend off too many requests at one time. The standard approach is, of course, named back pressure, such as not replying to a query until resources permit and thus keeping competion outside of the application. That limits resource consumption, including locks, memory and sundry, while permiting normal work within to progress smoothly. Let the crowds coming to a hit show queue in the rain outside the theatre until empty seats become available.

On 27/05/2019 08:52, Joe Doupnik wrote:
Generalizations tend to fail when confronted with conflicting evidence. The simple  evidence is asking how much real memory the Solr owned process has been allocated (top, or ps aux or similar) and that yields two very different values (the ~1.6GB of Solr v8.0 and 4.5+GB of Solr v8.1). I have no knowledge of how Java chooses to name its usage (heap or otherwise). Prior to v8.1 Solr memory consumption varied with activity, thus memory management was occuring, memory was borrowed from and returned to the system. What might be happening in Solr v8.1 is the new memory management code is failing to do a proper job, for reasons which are not visible to us in the field, and that failure is important to us.     In regard to the referenced lock discussion, it would be a good idea to not let the tail wag the dog, tend the common cases and live with a few corner case difficulties because perfection is not possible.
    Thanks,
    Joe D.

On 26/05/2019 20:30, Shawn Heisey wrote:
On 5/26/2019 12:52 PM, Joe Doupnik wrote:
     I do queries while indexing, have done so for a long time, without difficulty nor memory usage spikes from dual use. The system has been designed to support that.      Again, one may look at the numbers using "top" or similar. Try Solr v8.0 and 8.1 to see the difference which I experience here. For reference, the only memory adjustables set in my configuration is in the Solr startup script solr.in.sh saying add "-Xss1024k" in the SOLR_OPTS list and setting SOLR_HEAP="4024m".

There is one significant difference between 8.0 and 8.1 in the realm of memory management -- we have switched from the CMS garbage collector to the G1 collector.  So the way that Java manages the heap has changed. This was done because the CMS collector is slated for removal from Java.

https://issues.apache.org/jira/browse/SOLR-13394

Java is unlike other programs in one respect -- once it allocates heap from the OS, it never gives it back.  This behavior has given Java an undeserved reputation as a memory hog ... but in fact Java's overall memory usage can be very easily limited ... an option that many other programs do NOT have.

In your configuration, you set the max heap to a little less than 4GB. You have to expect that it *WILL* use that memory.  By using the SOLR_HEAP variable, you have instructed Solr's startup script to use the same setting for the minimum heap as well as the maximum heap. This is the design intent.

If you want to know how much heap is being used, you can't ask the operating system, which means tools like top.  You have to ask Java. And you will have to look at a long-term graph, finding the low points. An instananeous look at Java's heap usage could show you that the whole heap is allocated ... but a significant part of that allocation could be garbage, which becomes available once the garbage is collected.

Thanks,
Shawn




Reply via email to