That's good. I think I need to mention one other point about this
matter. It is feeding files into Tika (in my case) is paced to avoid
overloads. That is done in my crawler by having a small adjustable pause
(~100ms) after each file submission, and then longer ones (1-3 sec)
after every 100 and 1000 submissions. Also the crawler is set to run at
a lower priority than Solr, thus giving preference to Solr.
In the end we ought to run experiments to find and verify working
values.
Thanks,
Joe D.
On 02/09/2020 03:40, yaswanth kumar wrote:
I got some understanding now about my actual question.. thanks all for your
valuable theories
Sent from my iPhone
On Sep 1, 2020, at 2:01 PM, Joe Doupnik <j...@netlab1.net> wrote:
As I have not received the follow-on message to mine I will cut&paste it
below.
My comments on that are the numbers are the numbers. More importantly, I have run
large imports ~0.5M docs and I have watched as that progresses. My crawler paces material
into Solr. Memory usage (Linux "top") shows cyclic small rises and falls,
peaking at about 2GB as the crawler introduces 1 and 3 second pauses every hundred and
thousand submissions.. The test shown in my original message is sufficient to show the
nature of Solr versions and the choice of garbage collector, and other folks can do
similar experiments on their gear. The quoted tests are indeed representative of large
and small amounts of various kinds of documents, and I say that based on much experience
observing the details.
Quibble about GC names if you wish, but please do see those experimental
results. Also note the difference in our SOLR_HEAP values: 2GB in my work, 8GB
in yours. I have found 2GB to work well for importing small and very large
collections (of many file varieties).
Thanks,
Joe D.
This is misleading and not particularly good advice.
Solr 8 does NOT contain G1. G1GC is a feature of the JVM. We’ve been using
it with Java 8 and Solr 6.6.2 for a few years.
A test with eighty documents doesn’t test anything. Try a million documents to
get Solr memory usage warmed up.
GC_TUNE has been in the solr.in.sh file for a long time. Here are the settings
we use with Java 8. We have about 120 hosts running Solr in six prod clusters.
SOLR_HEAP=8g
# Use G1 GC -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On 01/09/2020 16:39, Joe Doupnik wrote:
Erick states this correctly. To give some numbers from my experiences, here are two
slides from my presentation about installing Solr (https://netlab1.net/, locate item
"Solr/Lucene Search Service"):
<hbifonfjanlomngl.png>
<phnahkoblmojphjo.png>
Thus we see a) experiments are the key, just as Erick says, and b) the
choice of garbage collection algorithm plays a major role.
In my setup I assigned SOLR_HEAP to be 2048m, SOLR_OPTS has -Xss1024k, plus stock
GC_TUNE values. Your "memorage" may vary.
Thanks,
Joe D.
On 01/09/2020 15:33, Erick Erickson wrote:
You want to run with the smallest heap you can due to Lucene’s use of
MMapDirectory,
see the excellent:
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
There’s also little reason to have different Xms and Xmx values, that just
means you’ll
eventually move a bunch of memory around as the heap expands, I usually set
them both
to the same value.
How to determine what “the smallest heap you can” is? Unfortunately there’s no
good way
outside of stress-testing your application with less and less memory until you
have problems,
then add some extra…
Best,
Erick
On Sep 1, 2020, at 10:27 AM, Dominique Bejean <dominique.bej...@eolya.fr> wrote:
Hi,
As all Java applications the Heap memory is regularly cleaned by the
garbage collector (some young items moved to the old generation heap zone
and unused old items removed from the old generation heap zone). This
causes heap usage to continuously grow and reduce.
Regards
Dominique
Le mar. 1 sept. 2020 à 13:50, yaswanth kumar <yaswanth...@gmail.com> a
écrit :
Can someone make me understand on how the value % on the column Heap is
calculated.
I did created a new solr cloud with 3 solr nodes and one zookeeper, its
not yet live neither interms of indexing or searching, but I do see some
spikes in the HEAP column against nodes when I refresh the page multiple
times. Its like almost going to 95% (sometimes) and then coming down to 50%
Solr version: 8.2
Zookeeper: 3.4
JVM size configured in solr.in.sh is min of 1GB to max of 10GB (actually
RAM size on the node is 16GB)
Basically need to understand if I need to worry about this heap % which
was quite altering before making it live? or is that quite normal, because
this is new UI change on solr cloud is kind of new to us as we used to have
solr 5 version before and this UI component doesn't exists then.
--
Thanks & Regards,
Yaswanth Kumar Konathala.
yaswanth...@gmail.com
Sent from my iPhone