Re: Understanding Solr heap %

2020-09-02 Thread Joe Doupnik
    That's good. I think I need to mention one other point about this 
matter. It is feeding files into Tika (in my case) is paced to avoid 
overloads. That is done in my crawler by having a small adjustable pause 
(~100ms) after each file submission, and then longer ones (1-3 sec) 
after every 100 and 1000 submissions. Also the crawler is set to run at 
a lower priority than Solr, thus giving preference to Solr.
    In the end we ought to run experiments to find and verify working 
values.

    Thanks,
    Joe D.

On 02/09/2020 03:40, yaswanth kumar wrote:

I got some understanding now about my actual question.. thanks all for your 
valuable theories

Sent from my iPhone


On Sep 1, 2020, at 2:01 PM, Joe Doupnik  wrote:

 As I have not received the follow-on message to mine I will cut&paste it 
below.
 My comments on that are the numbers are the numbers. More importantly, I have run 
large imports ~0.5M docs and I have watched as that progresses. My crawler paces material 
into Solr. Memory usage (Linux "top") shows cyclic small rises and falls, 
peaking at about 2GB as the crawler introduces 1 and 3 second pauses every hundred and 
thousand submissions.. The test shown in my original message is sufficient to show the 
nature of Solr versions and the choice of garbage collector, and other folks can do 
similar experiments on their gear. The quoted tests are indeed representative of large 
and small amounts of various kinds of documents, and I say that based on much experience 
observing the details.
 Quibble about GC names if you wish, but please do see those experimental 
results. Also note the difference in our SOLR_HEAP values: 2GB in my work, 8GB 
in yours. I have found 2GB to work well for importing small and very large 
collections (of many file varieties).
 Thanks,
 Joe D.

This is misleading and not particularly good advice.

Solr 8 does NOT contain G1. G1GC is a feature of the JVM. We’ve been using
it with Java 8 and Solr 6.6.2 for a few years.

A test with eighty documents doesn’t test anything. Try a million documents to
get Solr memory usage warmed up.

GC_TUNE has been in the solr.in.sh file for a long time. Here are the settings
we use with Java 8. We have about 120 hosts running Solr in six prod clusters.

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)
On 01/09/2020 16:39, Joe Doupnik wrote:
 Erick states this correctly. To give some numbers from my experiences, here are two 
slides from my presentation about installing Solr (https://netlab1.net/, locate item 
"Solr/Lucene Search Service"):




 Thus we see a) experiments are the key, just as Erick says, and b) the 
choice of garbage collection algorithm plays a major role.
 In my setup I assigned SOLR_HEAP to be 2048m, SOLR_OPTS has -Xss1024k, plus stock 
GC_TUNE values. Your "memorage" may vary.
 Thanks,
 Joe D.


On 01/09/2020 15:33, Erick Erickson wrote:
You want to run with the smallest heap you can due to Lucene’s use of 
MMapDirectory,
see the excellent:

https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

There’s also little reason to have different Xms and Xmx values, that just 
means you’ll
eventually move a bunch of memory around as the heap expands, I usually set 
them both
to the same value.

How to determine what “the smallest heap you can” is? Unfortunately there’s no 
good way
outside of stress-testing your application with less and less memory until you 
have problems,
then add some extra…

Best,
Erick


On Sep 1, 2020, at 10:27 AM, Dominique Bejean  wrote:

Hi,

As all Java applications the Heap memory is regularly cleaned by the
garbage collector (some young items moved to the old generation heap zone
and unused old items removed from the old generation heap zone). This
causes heap usage to continuously grow and reduce.

Regards

Dominique




Le mar. 1 sept. 2020 à 13:50, yaswanth kumar  a
écrit :

Can someone make me understand on how the value % on the column Heap is
calculated.

I did created a new solr cloud with 3 solr nodes and one zookeeper, its
not yet live neither interms of indexing or searching, but I do see some
spikes in the HEAP column against nodes when I refresh the page multiple
times. Its like almost going to 95% (sometimes) and then coming down to 50%

Solr version: 8.2
Zookeeper: 3.4

JVM size configured in solr.in.sh is min of 1GB to max of 10GB (actually
RAM size on the node is 16GB)

Basically need to understand if I need to worry about this heap % which
was quite altering before making it live? or is that quite normal, because
this is new UI change on solr cloud is kind of new to us as we used to have
solr 5 version before and this UI comp

Does change in similarity class needs reindexing

2020-09-02 Thread YOGENDRA SONI
Hi all,
I am experimenting with different parameters of BM25 and Sweetspot
similarity.
I changed solr field type definition  like given below.
I need clarification that changing similarity in field type need reidexing
or not.
{"replace-field-type":{
"name":"content_text",
"class":"solr.TextField",
"positionIncrementGap":"100",
"similarity":{"class":"solr.SweetSpotSimilarityFactory",
"min":3000,
"max":4000,
"steepness":0.5},
..
..
..


Re: Understanding Solr heap %

2020-09-02 Thread Bernd Fehling
You should _not_ set "-XX:G1HeapRegionSize=n" , because:
"... The goal is to have around 2048 regions based on the minimum Java heap 
size"
The value of G1HeapRegionSize is automatically calculated upon start up of the 
JVM.

The parameter "-XX:MaxGCPauseMillis=200" is the default.
Where is the sense of explicitly setting a default parameter to its default 
value?

Regards
Bernd


Am 01.09.20 um 18:00 schrieb Walter Underwood:
> This is misleading and not particularly good advice.
> 
> Solr 8 does NOT contain G1. G1GC is a feature of the JVM. We’ve been using
> it with Java 8 and Solr 6.6.2 for a few years.
> 
> A test with eighty documents doesn’t test anything. Try a million documents to
> get Solr memory usage warmed up.
> 
> GC_TUNE has been in the solr.in.sh file for a long time. Here are the settings
> we use with Java 8. We have about 120 hosts running Solr in six prod clusters.
> 
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Sep 1, 2020, at 8:39 AM, Joe Doupnik  wrote:
>>
>> Erick states this correctly. To give some numbers from my experiences, 
>> here are two slides from my presentation about installing Solr 
>> (https://netlab1.net/ , locate item "Solr/Lucene 
>> Search Service"):
>>> 
>>
>>> 
>>
>> Thus we see a) experiments are the key, just as Erick says, and b) the 
>> choice of garbage collection algorithm plays a major role.
>> In my setup I assigned SOLR_HEAP to be 2048m, SOLR_OPTS has -Xss1024k, 
>> plus stock GC_TUNE values. Your "memorage" may vary.
>> Thanks,
>> Joe D.
>>
>> On 01/09/2020 15:33, Erick Erickson wrote:
>>> You want to run with the smallest heap you can due to Lucene’s use of 
>>> MMapDirectory, 
>>> see the excellent:
>>>
>>> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html 
>>> 
>>>
>>> There’s also little reason to have different Xms and Xmx values, that just 
>>> means you’ll
>>> eventually move a bunch of memory around as the heap expands, I usually set 
>>> them both
>>> to the same value.
>>>
>>> How to determine what “the smallest heap you can” is? Unfortunately there’s 
>>> no good way
>>> outside of stress-testing your application with less and less memory until 
>>> you have problems,
>>> then add some extra…
>>>
>>> Best,
>>> Erick
>>>
 On Sep 1, 2020, at 10:27 AM, Dominique Bejean  
  wrote:

 Hi,

 As all Java applications the Heap memory is regularly cleaned by the
 garbage collector (some young items moved to the old generation heap zone
 and unused old items removed from the old generation heap zone). This
 causes heap usage to continuously grow and reduce.

 Regards

 Dominique




 Le mar. 1 sept. 2020 à 13:50, yaswanth kumar  
  a
 écrit :

> Can someone make me understand on how the value % on the column Heap is
> calculated.
>
> I did created a new solr cloud with 3 solr nodes and one zookeeper, its
> not yet live neither interms of indexing or searching, but I do see some
> spikes in the HEAP column against nodes when I refresh the page multiple
> times. Its like almost going to 95% (sometimes) and then coming down to 
> 50%
>
> Solr version: 8.2
> Zookeeper: 3.4
>
> JVM size configured in solr.in.sh is min of 1GB to max of 10GB (actually
> RAM size on the node is 16GB)
>
> Basically need to understand if I need to worry about this heap % which
> was quite altering before making it live? or is that quite normal, because
> this is new UI change on solr cloud is kind of new to us as we used to 
> have
> solr 5 version before and this UI component doesn't exists then.
>
> --
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com 
>
> Sent from my iPhone
>>
> 
> 


Quick Question

2020-09-02 Thread William Morin
  Hi,
I was looking for some articles to read about "Schema Markup" today when I
stumbled on your [
https://cwiki.apache.org/confluence/display/SOLR/UsingMailingLists  ].
Very cool.
Anyway, I noticed that there a text in your blog "Schema Markup" and
luckily it's my keyword. I hope if you don't mind can you give me a
backlink on that "Schema Markup" on your page or in the resource page
section. It might be worth adding to your page.
Thanks and have a great day!
Regards,
William


Re: Using Solr's zkcli.sh

2020-09-02 Thread Vincent Brehin
Hi Victor,
For me also it's a first post to the list. Even if I am quite old to solr,
I am a recent subscriber.
I guess you used install_solr_service.sh script for installing (either
directly or through ansible role, or another wrapper).
IIRC this script removes exec permission for some other commands,
including zkcli.
So you should first launch "sudo chmod a+x
server/scripts/cloud-scripts/zkcli.sh" , then you should be able to use the
command.
Let us know !
Vincent


Le mar. 1 sept. 2020 à 23:35, Victor Kretzer  a
écrit :

> Thank you in advance. This is my first time using a mailing list like this
> so hopefully I am doing so correctly.
>
> I am attempting to setup SolrCloud (Solr 6.6.6) and an external zookeeper
> ensemble on Azure. I have three dedicated to the zookeeper ensemble and two
> for solr all running Ubuntu 18.04 LTS. I've been relying on the following
> documents:
>
>
>   *   Taking Solr to Production<
> https://lucene.apache.org/solr/guide/6_6/taking-solr-to-production.html#taking-solr-to-production
> >
>   *   Enbabling SSL<
> https://lucene.apache.org/solr/guide/6_6/enabling-ssl.html#enabling-ssl>
>
> I was able to complete the stand-alone portion of Enabling SSL on each of
> the solr machines and have successfully navigated to the Admin page using
> https://private.address/solr.
>
>
> I am now trying to complete the section, SSL with SolrCloud, but I cannot
> get past the Configure Zookeeper section. Whenever I try to run
> server/scripts/cloud-scripts/zkcli.sh it says:
> it says
> -bash: server/scripts/cloud-scripts/zkcli.sh: Permission denied
>
> I've tried using sudo server/...  but then it says:
> sudo: server/scripts/cloud-scripts/zkcli.sh: command not found
>
> What am I doing wrong? Any help getting this set up would be greatly
> appreciated.
>
> Thanks,
>
> Victor
>


RE: Using Solr's zkcli.sh

2020-09-02 Thread Victor Kretzer
Vincent --

Your suggestion worked perfectly. After using chmod I'm now able to use the 
zkcli script. Thank you so much for the quick save.

Victor



Victor Kretzer
Sitecore Developer
Application Services
GDC IT Solutions
Office: 717-262-2080 ext. 151

www.gdcitsolutions.com

-Original Message-
From: Vincent Brehin  
Sent: Wednesday, September 2, 2020 6:10 AM
To: solr-user@lucene.apache.org
Subject: Re: Using Solr's zkcli.sh

Hi Victor,
For me also it's a first post to the list. Even if I am quite old to solr, I am 
a recent subscriber.
I guess you used install_solr_service.sh script for installing (either directly 
or through ansible role, or another wrapper).
IIRC this script removes exec permission for some other commands, including 
zkcli.
So you should first launch "sudo chmod a+x 
server/scripts/cloud-scripts/zkcli.sh" , then you should be able to use the 
command.
Let us know !
Vincent


Le mar. 1 sept. 2020 à 23:35, Victor Kretzer  a écrit :

> Thank you in advance. This is my first time using a mailing list like 
> this so hopefully I am doing so correctly.
>
> I am attempting to setup SolrCloud (Solr 6.6.6) and an external 
> zookeeper ensemble on Azure. I have three dedicated to the zookeeper 
> ensemble and two for solr all running Ubuntu 18.04 LTS. I've been 
> relying on the following
> documents:
>
>
>   *   Taking Solr to Production<
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fluce
> ne.apache.org%2Fsolr%2Fguide%2F6_6%2Ftaking-solr-to-production.html%23
> taking-solr-to-production&data=02%7C01%7CVictorKretzer%40gdcit.com
> %7Cfb9f75bdcd7b485a8ace08d84f285bac%7C87b66f08478c40adbb095e93796da295
> %7C1%7C0%7C637346382497269184&sdata=A1jZsEDZ%2FtEGKyvjixLyjYI7F%2F
> 0AQaZ38zM4hZbwuqw%3D&reserved=0
> >
>   *   Enbabling SSL<
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fluce
> ne.apache.org%2Fsolr%2Fguide%2F6_6%2Fenabling-ssl.html%23enabling-ssl&
> amp;data=02%7C01%7CVictorKretzer%40gdcit.com%7Cfb9f75bdcd7b485a8ace08d
> 84f285bac%7C87b66f08478c40adbb095e93796da295%7C1%7C0%7C637346382497269
> 184&sdata=kDnYjqRCKqZ%2BjZpW3dEFdxi6lL1Glp%2BVPXozsJpAn9Y%3D&r
> eserved=0>
>
> I was able to complete the stand-alone portion of Enabling SSL on each 
> of the solr machines and have successfully navigated to the Admin page 
> using 
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprivate.address%2Fsolr&data=02%7C01%7CVictorKretzer%40gdcit.com%7Cfb9f75bdcd7b485a8ace08d84f285bac%7C87b66f08478c40adbb095e93796da295%7C1%7C0%7C637346382497269184&sdata=aIPyGtMTFuZ3p7lmOi%2B11CRFZ8f3BQPm4NDU42zrhDI%3D&reserved=0.
>
>
> I am now trying to complete the section, SSL with SolrCloud, but I 
> cannot get past the Configure Zookeeper section. Whenever I try to run 
> server/scripts/cloud-scripts/zkcli.sh it says:
> it says
> -bash: server/scripts/cloud-scripts/zkcli.sh: Permission denied
>
> I've tried using sudo server/...  but then it says:
> sudo: server/scripts/cloud-scripts/zkcli.sh: command not found
>
> What am I doing wrong? Any help getting this set up would be greatly 
> appreciated.
>
> Thanks,
>
> Victor
>


Fwd: Does change in similarity class needs reindexing

2020-09-02 Thread YOGENDRA SONI
I changed attributes  reloaded the collection but scores are not changing
also (norm(content_text)) is not changing.

i did reindexing of documents but  scores are not changing.
steps i followed.
1 Created fields using default similarity.

created content_text field type without similarity section.
created content field with type: content_text.
2. changed field type definition to add similarity
"similarity":{"class":"solr.SweetSpotSimilarityFactory",
"min":100,
"max":2000,
"steepness":0.5}
 then. indexed documents

3. ran query *content:wireless *and save scores.


4. changed field type definition to change min and max in similarity.

{"replace-field-type":{
"name":"content_text",
"class":"solr.TextField",
"positionIncrementGap":"100",
"similarity":{"class":"solr.SweetSpotSimilarityFactory",
"min":3000,
"max":4000,
"steepness":0.5},
..
..
..

5. RELOADed the collection and ran same query to see the change in ranking
and scores.
It is not changed.

6. Reindexed the documents results are not changed,


ApacheCon@home 2020 - Semantic Graph BoF

2020-09-02 Thread Claude Warren
Greetings,

ApacheCon is almost upon us.  This year it is online and free.  So please
make plans to attend.

This year Apache Jena is hosting a Semantic Graph "Birds of a Feather"
session[1] as part of the Jena track.  Please come join us and discuss all
things Semantic Graph.

[1] https://www.apachecon.com/acah2020/tracks/jena.html


Replication in soft commit

2020-09-02 Thread Tushar Arora
Hi,
I want to ask if the soft commit works in replication.
One of our use cases deals with indexing the data every second on a master
server. And then it has to replicate to slaves. So if we use soft commit,
then does the data replicate immediately to the slave server or after the
hard commit takes place.
Use cases require transfer of data from master to slave immediately.

Regards,
Tushar