You are certainly correct about using external load balancers when
appropriate. However, a basic problem with servers, that of accepting
more incoming items than can be handled gracefully is as we know an
age-old one and solved by back pressure methods (particularly hard
limits). My experience with Solr suggests that parts (say Tika) are
being too nice to incoming material, letting too many items enter the
application, consume resources, and so forth which then become awkward
to handle (see the locks item discussion cited earlier). Entry ought to
be blocked until the processing structure declares that resources are
available to accept new entries (a full but not overfull pipeline).
Those internal issues, locks, memory and similar, are resolvable when
limits are imposed. Also, with limits then your mentioned load balancers
stand a chance of sensing when a particular server is currently not
accepting new requests. Establishing limits does take some creative
thinking about how the system as a whole is constructed.
I brought up the overload case because it pertains to this main
memory management thread.
Thanks,
Joe D.
On 27/05/2019 10:21, Bernd Fehling wrote:
I think it is not fair blaiming Solr not also having a load balancer.
It is up to you and your needs to set up the required infrastucture
including load balancing. The are many products available on the market.
If your current system can't handle all requests then install more
replicas.
Regards
Bernd
Am 27.05.19 um 10:33 schrieb Joe Doupnik:
While on the topic of resource consumption and locks etc, there
is one other aspect to which Solr has been vulnerable. It is failing
to fend off too many requests at one time. The standard approach is,
of course, named back pressure, such as not replying to a query until
resources permit and thus keeping competion outside of the
application. That limits resource consumption, including locks,
memory and sundry, while permiting normal work within to progress
smoothly. Let the crowds coming to a hit show queue in the rain
outside the theatre until empty seats become available.
On 27/05/2019 08:52, Joe Doupnik wrote:
Generalizations tend to fail when confronted with conflicting
evidence. The simple evidence is asking how much real memory the
Solr owned process has been allocated (top, or ps aux or similar)
and that yields two very different values (the ~1.6GB of Solr v8.0
and 4.5+GB of Solr v8.1). I have no knowledge of how Java chooses to
name its usage (heap or otherwise). Prior to v8.1 Solr memory
consumption varied with activity, thus memory management was
occuring, memory was borrowed from and returned to the system. What
might be happening in Solr v8.1 is the new memory management code is
failing to do a proper job, for reasons which are not visible to us
in the field, and that failure is important to us.
In regard to the referenced lock discussion, it would be a good
idea to not let the tail wag the dog, tend the common cases and live
with a few corner case difficulties because perfection is not possible.
Thanks,
Joe D.
On 26/05/2019 20:30, Shawn Heisey wrote:
On 5/26/2019 12:52 PM, Joe Doupnik wrote:
I do queries while indexing, have done so for a long time,
without difficulty nor memory usage spikes from dual use. The
system has been designed to support that.
Again, one may look at the numbers using "top" or similar.
Try Solr v8.0 and 8.1 to see the difference which I experience
here. For reference, the only memory adjustables set in my
configuration is in the Solr startup script solr.in.sh saying add
"-Xss1024k" in the SOLR_OPTS list and setting SOLR_HEAP="4024m".
There is one significant difference between 8.0 and 8.1 in the
realm of memory management -- we have switched from the CMS garbage
collector to the G1 collector. So the way that Java manages the
heap has changed. This was done because the CMS collector is slated
for removal from Java.
https://issues.apache.org/jira/browse/SOLR-13394
Java is unlike other programs in one respect -- once it allocates
heap from the OS, it never gives it back. This behavior has given
Java an undeserved reputation as a memory hog ... but in fact
Java's overall memory usage can be very easily limited ... an
option that many other programs do NOT have.
In your configuration, you set the max heap to a little less than
4GB. You have to expect that it *WILL* use that memory. By using
the SOLR_HEAP variable, you have instructed Solr's startup script
to use the same setting for the minimum heap as well as the maximum
heap. This is the design intent.
If you want to know how much heap is being used, you can't ask the
operating system, which means tools like top. You have to ask
Java. And you will have to look at a long-term graph, finding the
low points. An instananeous look at Java's heap usage could show
you that the whole heap is allocated ... but a significant part of
that allocation could be garbage, which becomes available once the
garbage is collected.
Thanks,
Shawn